我需要在一个文本文件中显示10个最常用的单词,从最常用到最少以及使用次数。我不能使用字典或计数器功能。到目前为止,我有这个:

import urllib
cnt = 0
i=0
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt")
uniques = []
for line in txtFile:
    words = line.split()
    for word in words:
        if word not in uniques:
            uniques.append(word)
for word in words:
    while i<len(uniques):
        i+=1
        if word in uniques:
             cnt += 1
print cnt

现在,我想我应该在数组“uniques”中查找每个单词,并查看它在此文件中重复了多少次,然后将其添加到另一个计算每个单词实例的数组中。但这就是我被困住的地方。我不知道该如何进行。

任何帮助,将不胜感激。谢谢

最佳答案

通过使用python集合可以很容易地解决上述问题
下面是解决方案。

from collections import Counter

data_set = "Welcome to the world of Geeks " \
"This portal has been created to provide well written well" \
"thought and well explained solutions for selected questions " \
"If you like Geeks for Geeks and would like to contribute " \
"here is your chance You can write article and mail your article " \
" to contribute at geeksforgeeks org See your article appearing on " \
"the Geeks for Geeks main page and help thousands of other Geeks. " \

# split() returns list of all the words in the string
split_it = data_set.split()

# Pass the split_it list to instance of Counter class.
Counters_found = Counter(split_it)
#print(Counters)

# most_common() produces k frequently encountered
# input values and their respective counts.
most_occur = Counters_found.most_common(4)
print(most_occur)

09-04 10:56