我想实现一个计数器,当计数器的大小超过某个阈值时,它会丢弃最不频繁的元素。为此,我需要删除最不频繁的元素。
在Python中,最快的方法是什么?
我知道,但它创建了一个完整的列表,并且在广泛执行时看起来很慢?是否有更好的命令(或者可能有不同的数据结构)?
最佳答案
您可以通过借用least_common
的实现并执行必要的更改来实现most_common
。
参考collections
source in Py2.7:
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))
为了改变它,以便检索最不常见的我们只需要一些调整。
import collections
from operator import itemgetter as _itemgetter
import heapq as _heapq
class MyCounter(collections.Counter):
def least_common(self, n=None):
if n is None:
return sorted(self.iteritems(), key=_itemgetter(1), reverse=False) # was: reverse=True
return _heapq.nsmallest(n, self.iteritems(), key=_itemgetter(1)) # was _heapq.nlargest
测验:
c = MyCounter("abbcccddddeeeee")
assert c.most_common() == c.least_common()[::-1]
assert c.most_common()[-1:] == c.least_common(1)