问题描述
说我有一些XML像
<item name=bread weight="5" edible="yes">
<body> some blah </body>
<item>
<item name=eggs weight="5" edible="yes">
<body> some blah </body>
<item>
<item name=meat weight="5" edible="yes">
<body> some blah </body>
<item>
我想每个项目的名称存储在使用美丽的汤列表
I want to store the name of each item in a list using beautiful soup
下面是尝试至今:
names =list()
for c in soup.findAll("item"):
#get name from the tag
names.append(name i got from tag)
这个方法很好工作了标签之间提取文本。
This method has worked perfectly for extracting text between tags.
我试图复制用于提取链接&LT方法; A HREF =www.blah.com&GT;
,但它似乎并没有工作。
I've tried copying the methods used for extracting links <a href="www.blah.com">
but it doesn't seem to work.
我将如何存储在一个列表中的名称信息? (其他列表包含正文所以关联性的原因索引必须是一致的)。
How would I store the name information in a list? (other lists contain the body text so for associativity reasons the indexes have to be consistent).
非常感谢
推荐答案
使用字典(item.attrs)获得('名')
来获取名称。
您遇到因为问题&LT;项目&GT;
应该是关闭标签,但它是一个开放的标签,因此你拿到6场比赛,而不是3.如果您有超过文字的任何控制,请使用结束标记来避免这个问题。
You are having issues since <item>
is supposed to be a closing tag but it is an opening tag, hence you get 6 matches rather than 3. If you have any control over the text, please use closing tags to avoid this.
下面是完整的片段如预期运行:
Here is the full snippet working as intended:
names = list()
for item in soup.findAll('item'):
name = dict(item.attrs).get('name')
if name is not None:
names.append(name)
这篇关于与beautifulsoup和python提取标签信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!