问题描述
我想在字符串的排序列表中搜索以给定子字符串开头的所有元素.
I want to search a sorted list of strings for all of the elements that start with a given substring.
以下是查找所有完全匹配项的示例:
Here's an example that finds all of the exact matches:
import bisect
names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
names.sort()
leftIndex = bisect.bisect_left(names, 'bob')
rightIndex = bisect.bisect_right(names, 'bob')
print(names[leftIndex:rightIndex])
哪个打印 ['bob','bob','bob']
.
相反,我想搜索所有以开头的名称.我想要的输出是 ['bob','bob','bob','bobby','bobert']
.如果我可以修改二等分搜索的比较方法,则可以使用 name.startswith('bob')
进行此操作.
Instead, I want to search for all the names that start with 'bob'. The output I want is ['bob', 'bob', 'bob', 'bobby', 'bobert']
. If I could modify the comparison method of the bisect search, then I could use name.startswith('bob')
to do this.
例如,在Java中这很容易.我会用:
As an example, in Java it would be easy. I would use:
Arrays.binarySearch(names, "bob", myCustomComparator);
其中"myCustomComparator"是一个利用startswith方法(以及一些其他逻辑)的比较器.
where 'myCustomComparator' is a comparator that takes advantage of the startswith method (and some additional logic).
如何在Python中做到这一点?
How do I do this in Python?
推荐答案
bisect
欺骗到使用自定义比较:
bisect
can be fooled into using a custom comparison by using an instance that uses the custom comparator of your chosing:
>>> class PrefixCompares(object):
... def __init__(self, value):
... self.value = value
... def __lt__(self, other):
... return self.value < other[0:len(self.value)]
...
>>> import bisect
>>> names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
>>> names.sort()
>>> key = PrefixCompares('bob')
>>> leftIndex = bisect.bisect_left(names, key)
>>> rightIndex = bisect.bisect_right(names, key)
>>> print(names[leftIndex:rightIndex])
['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert']
>>>
DOH.右边的二等分有效,但是左边的二等分显然没有."adam"不以"bob"为前缀!要修复它,您还必须调整顺序.
DOH. the right bisect worked, but the left one obviously didn't. "adam" is not prefixed with "bob"!. to fix it, you have to adapt the sequence, too.
>>> class HasPrefix(object):
... def __init__(self, value):
... self.value = value
... def __lt__(self, other):
... return self.value[0:len(other.value)] < other.value
...
>>> class Prefix(object):
... def __init__(self, value):
... self.value = value
... def __lt__(self, other):
... return self.value < other.value[0:len(self.value)]
...
>>> class AdaptPrefix(object):
... def __init__(self, seq):
... self.seq = seq
... def __getitem__(self, key):
... return HasPrefix(self.seq[key])
... def __len__(self):
... return len(self.seq)
...
>>> import bisect
>>> names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
>>> names.sort()
>>> needle = Prefix('bob')
>>> haystack = AdaptPrefix(names)
>>> leftIndex = bisect.bisect_left(haystack, needle)
>>> rightIndex = bisect.bisect_right(haystack, needle)
>>> print(names[leftIndex:rightIndex])
['bob', 'bob', 'bob', 'bobby', 'bobert']
>>>
这篇关于在Python中对字符串前缀执行二进制搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!