问题描述
我在寻找在Python中创建一个字典,其中最关键的是HTML标记的名称和值标签出现的次数。有没有办法与美丽的汤或别的东西来做到这一点?
I'm looking at creating a dictionary in python where the key is the html tag name and the value is the number of times the tag appeared. Is there a way to do this with beautiful soup or something else?
推荐答案
使用BeautifulSoup可以通过省略搜索条件搜索所有标签:
With BeautifulSoup you can search for all tags by omitting the search criteria:
# print all tags
for tag in soup.findAll():
print tag.name # TODO: add/update dict
如果你只是对出现的次数有兴趣,BeautifulSoup可能是有点大材小用在这种情况下,你可以使用的来代替:
If you're only interested in the number of occurrences, BeautifulSoup may be a bit overkill in which case you could use the HTMLParser
instead:
from HTMLParser import HTMLParser
class print_tags(HTMLParser):
def handle_starttag(self, tag, attrs):
print tag # TODO: add/update dict
parser = print_tags()
parser.feed(html)
这将产生相同的输出。
要创建字典{'标签':计数}
你可以使用<$c$c>collections.defaultdict$c$c>:
To create the dictionary of { 'tag' : count }
you could use collections.defaultdict
:
from collections import defaultdict
occurrences = defaultdict(int)
# ...
occurrences[tag_name] += 1
这篇关于有美丽的汤的方法来计算一个HTML页面的标签数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!