我正在做一个基本的练习项目。我调用一个简单的Wikipedia页面,然后使用Beautiful Soup将所有内容写入文本文件。然后我计算一下那个单词在该新写入的文本文件中出现的次数
由于某种原因,第一次运行代码时,我得到的编号与第二次运行代码时所得到的编号不同。
我相信第一次运行代码时,“ anime.txt”与第二次运行代码时不同。
问题一定出在我用Beautiful Soup收集所有文本数据的方式上。
请帮忙
from urllib.request import urlopen
from bs4 import BeautifulSoup
f = open("anime.txt", "w", encoding="utf-8")
f.write("")
f.close()
my_url ="https://en.wikipedia.org/wiki/Anime"
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
p=page_soup.findAll("p")
f = open("anime.txt", "a", encoding="utf-8")
for i in p:
f.write(i.text)
f.write("\n\n")
data= open("anime.txt", encoding="utf-8").read()
anime_count = data.count("anime")
Anime_count = data.count("Anime")
print(anime_count,"\n")
print(Anime_count, "\n")
count= anime_count+Anime_count
print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)
第一个输出:
动漫数= 14
Anime_count = 97
数量= 111
第二个输出:
动漫数= 23
Anime_count = 139
数= 162
编辑:
我根据前两个注释编辑了代码,当然,现在可以使用:P。
关于正确打开和关闭文件的方式/次数,这看起来更好吗?
from urllib.request import urlopen
from bs4 import BeautifulSoup
my_url ="https://en.wikipedia.org/wiki/Anime"
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
p=page_soup.findAll("p")
f = open("anime.txt", "w", encoding="utf-8")
for i in p:
f.write(i.text)
f.write("\n\n")
f.close()
data= open("anime.txt", encoding="utf-8").read()
anime_count = data.count("anime")
Anime_count = data.count("Anime")
print(anime_count,"\n")
print(Anime_count, "\n")
count= anime_count+Anime_count
print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)
最佳答案
不要对打开和关闭文件感到困惑。在with
statements中包括所有写作/阅读部分。
from urllib.request import urlopen
from bs4 import BeautifulSoup
with open("anime.txt", "w", encoding="utf-8") as outfile:
my_url ="https://en.wikipedia.org/wiki/Anime"
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
p=page_soup.findAll("p")
for i in p:
outfile.write(i.text)
outfile.write("\n\n")
with open("anime.txt", "r", encoding="utf-8") as infile:
data = infile.read()
anime_count = data.count("anime")
Anime_count = data.count("Anime")
print(anime_count,"\n")
print(Anime_count, "\n")
count= anime_count+Anime_count
print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)s : ", count)
关于python - 为什么我第一次运行字计数器与第二次字计数器产生不同的输出?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54496973/