问题描述
我正在用美丽的汤.有这样的标签:
I'm using beautiful soup. There is a tag like this:
<li><a href="example"> s.r.o., <small>small</small></a></li>
我只想获取锚点<a>
标记内的文本,而输出中的<small>
标记则不包含任何文本;即" s.r.o.,
"
I want to get the text within the anchor <a>
tag only, without any from the <small>
tag in the output; i.e. " s.r.o.,
"
我尝试了find('li').text[0]
,但是它不起作用.
I tried find('li').text[0]
but it does not work.
BS4中是否有可以执行此操作的命令?
Is there a command in BS4 which can do that?
推荐答案
一种选择是从:
One option would be to get the first element from the contents
of the a
element:
>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
s.r.o.,
另一种方法是找到small
标记并获取先前的兄弟姐妹:
Another one would be to find the small
tag and get the previous sibling:
>>> print soup.find('small').previous_sibling
s.r.o.,
好吧,还有各种各样的选择/疯狂选择:
Well, there are all sorts of alternative/crazy options also:
>>> print next(soup.find('a').descendants)
s.r.o.,
>>> print next(iter(soup.find('a')))
s.r.o.,
这篇关于BS4:在标签中获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!