在Python中对字符串前缀执行二进制搜索

本文介绍了在Python中对字符串前缀执行二进制搜索的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在字符串的排序列表中搜索以给定子字符串开头的所有元素.

I want to search a sorted list of strings for all of the elements that start with a given substring.

以下是查找所有完全匹配项的示例:

Here's an example that finds all of the exact matches:

import bisect
names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
names.sort()
leftIndex = bisect.bisect_left(names, 'bob')
rightIndex = bisect.bisect_right(names, 'bob')
print(names[leftIndex:rightIndex])

哪个打印 ['bob'，'bob'，'bob'] .

相反，我想搜索所有以开头的名称.我想要的输出是 ['bob'，'bob'，'bob'，'bobby'，'bobert'] .如果我可以修改二等分搜索的比较方法，则可以使用 name.startswith('bob')进行此操作.

Instead, I want to search for all the names that start with 'bob'. The output I want is ['bob', 'bob', 'bob', 'bobby', 'bobert']. If I could modify the comparison method of the bisect search, then I could use name.startswith('bob') to do this.

例如，在Java中这很容易.我会用:

As an example, in Java it would be easy. I would use:

Arrays.binarySearch(names, "bob", myCustomComparator);

其中"myCustomComparator"是一个利用startswith方法(以及一些其他逻辑)的比较器.

where 'myCustomComparator' is a comparator that takes advantage of the startswith method (and some additional logic).

如何在Python中做到这一点?

How do I do this in Python?

推荐答案

bisect 欺骗到使用自定义比较:

bisect can be fooled into using a custom comparison by using an instance that uses the custom comparator of your chosing:

>>> class PrefixCompares(object):
...     def __init__(self, value):
...         self.value = value
...     def __lt__(self, other):
...         return self.value < other[0:len(self.value)]
... 
>>> import bisect
>>> names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
>>> names.sort()
>>> key = PrefixCompares('bob')
>>> leftIndex = bisect.bisect_left(names, key)
>>> rightIndex = bisect.bisect_right(names, key)
>>> print(names[leftIndex:rightIndex])
['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert']
>>>

DOH.右边的二等分有效，但是左边的二等分显然没有."adam"不以"bob"为前缀！要修复它，您还必须调整顺序.

DOH. the right bisect worked, but the left one obviously didn't. "adam" is not prefixed with "bob"!. to fix it, you have to adapt the sequence, too.

>>> class HasPrefix(object):
...     def __init__(self, value):
...         self.value = value
...     def __lt__(self, other):
...         return self.value[0:len(other.value)] < other.value
... 
>>> class Prefix(object):
...     def __init__(self, value):
...         self.value = value
...     def __lt__(self, other):
...         return self.value < other.value[0:len(self.value)]
... 
>>> class AdaptPrefix(object):
...     def __init__(self, seq):
...         self.seq = seq
...     def __getitem__(self, key):
...         return HasPrefix(self.seq[key])
...     def __len__(self):
...         return len(self.seq)
... 
>>> import bisect
>>> names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
>>> names.sort()
>>> needle = Prefix('bob')
>>> haystack = AdaptPrefix(names)
>>> leftIndex = bisect.bisect_left(haystack, needle)
>>> rightIndex = bisect.bisect_right(haystack, needle)
>>> print(names[leftIndex:rightIndex])
['bob', 'bob', 'bob', 'bobby', 'bobert']
>>>

这篇关于在Python中对字符串前缀执行二进制搜索的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！