问题描述
现在我可以计算列表中每个单词的频率.
>>>列表 =['a', 'b', 'a', 'c', 'a', 'c']频率 = {}对于 w 的话:频率[w] = 频率.get(w, 0) + 1返回频率
它给了我这个输出:
{'a': 3, 'b': 1, 'c: 2'}
但我希望它给我的是每个列表项的配对频率.例如,'b' 出现在 'a' 之后 1 次,'c' 出现在 'a' 之后 2 次.
{'a':{'b':1,'c':2},'b':{'a':1},'c':{'a':1}}
我将如何实现这一目标?
如果您愿意接受稍微不同的格式,使用 collections.Counter
和 很容易获得成对计数>邮编
:
如果你真的想要你给出的格式,你有几个选择,但一种方法是使用 itertools.groupby
将所有以相同元素开头的对收集在一起:
Right now I am able to count the frequency of each word in a list.
>>> list =['a', 'b', 'a', 'c', 'a', 'c']
frequency = {}
for w in words:
frequency[w] = frequency.get(w, 0) + 1
return frequency
It gives me this output:
But what I'd like for it to give me is the frequency of pairs for each list item. For example, 'b' comes after 'a' 1 time and 'c' comes after 'a' 2 times.
How would I go about accomplishing this?
If you're willing to accept a slightly different format, it's easy to get the pairwise counts using collections.Counter
and zip
:
>>> seq = list("abacac")
>>> from collections import Counter
>>> c = Counter(zip(seq, seq[1:]))
>>> c
Counter({('a', 'c'): 2, ('b', 'a'): 1, ('c', 'a'): 1, ('a', 'b'): 1})
If you really want the format you gave, you have a few options, but one way would be to use itertools.groupby
to collect all the pairs starting with the same element together:
>>> from itertools import groupby
>>> grouped = groupby(sorted(zip(seq, seq[1:])), lambda x: x[0])
>>> {k: dict(Counter(x[1] for x in g)) for k,g in grouped}
{'a': {'c': 2, 'b': 1}, 'c': {'a': 1}, 'b': {'a': 1}}
这篇关于如何计算列表中对的频率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!