我想实现一个计数器,当计数器的大小超过某个阈值时,它会丢弃最不频繁的元素。为此,我需要删除最不频繁的元素。
在Python中,最快的方法是什么?
我知道,但它创建了一个完整的列表,并且在广泛执行时看起来很慢?是否有更好的命令(或者可能有不同的数据结构)?

最佳答案

您可以通过借用least_common的实现并执行必要的更改来实现most_common
参考collections source in Py2.7

def most_common(self, n=None):
    '''List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.

    >>> Counter('abcdeabcdabcaba').most_common(3)
    [('a', 5), ('b', 4), ('c', 3)]

    '''
    # Emulate Bag.sortedByCount from Smalltalk
    if n is None:
        return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
    return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))

为了改变它,以便检索最不常见的我们只需要一些调整。
import collections
from operator import itemgetter as _itemgetter
import heapq as _heapq


class MyCounter(collections.Counter):
    def least_common(self, n=None):
        if n is None:
            return sorted(self.iteritems(), key=_itemgetter(1), reverse=False)  # was: reverse=True
        return _heapq.nsmallest(n, self.iteritems(), key=_itemgetter(1))  # was _heapq.nlargest

测验:
c = MyCounter("abbcccddddeeeee")
assert c.most_common() == c.least_common()[::-1]
assert c.most_common()[-1:] == c.least_common(1)

09-26 14:31