计算文本文件中字母的频率

本文介绍了计算文本文件中字母的频率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在python中，如何遍历文本文件并计算每个字母出现的次数?我意识到我可以只使用for x in file"语句来完成它，然后设置 26 个左右的 if elif 语句，但肯定有更好的方法吗?

谢谢.

解决方案

使用 collections.Counter():

from collections import Counter使用 open(file) 作为 f:c = 计数器()对于 f 中的行:c += 计数器(行)

如果文件不是那么大，可以将其作为字符串全部读入内存，并在一行代码中将其转换为Counter对象:

c = Counter(f.read())

示例:

>>>c = 计数器()>>>c += Counter('aaabbbcccddd eee fff ggg')>>>C计数器({'a':3，'':3，'c':3，'b':3，'e':3，'d':3，'g':3，'f':3})>>>c += Counter('aaabbbccc')计数器({'a':6，'c':6，'b':6，'':3，'e':3，'d':3，'g':3，'f':3})

或使用 count() 字符串的方法:

from string import ascii_lowercase # ascii_lowercase =='abcdefghijklmnopqrstuvwxyz'使用 open(file) 作为 f:文本 = f.read().strip()dic = {}对于 ascii_lowercase 中的 x:dic[x] = text.count(x)

In python, how can I iterate through a text file and count the number of occurrences of each letter? I realise I could just use a 'for x in file' statement to go through it and then set up 26 or so if elif statements, but surely there is a better way to do it?

Thanks.

解决方案

Use collections.Counter():

from collections import Counter
with open(file) as f:
    c = Counter()
    for line in f:
        c += Counter(line)

If the file is not so large, you can read all of it into memory as a string and convert it into a Counter object in one line of code:

c = Counter(f.read())

Example:

>>> c = Counter()
>>> c += Counter('aaabbbcccddd eee fff ggg')
>>> c
Counter({'a': 3, ' ': 3, 'c': 3, 'b': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3})
>>> c += Counter('aaabbbccc')
Counter({'a': 6, 'c': 6, 'b': 6, ' ': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3})

or use the count() method of strings:

from string import ascii_lowercase     # ascii_lowercase =='abcdefghijklmnopqrstuvwxyz'
with open(file) as f:
    text = f.read().strip()
    dic = {}
    for x in ascii_lowercase:
        dic[x] = text.count(x)

这篇关于计算文本文件中字母的频率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！