问题描述
我正在尝试使用以下代码从标记中提取innerHTML:
I am trying to extract the innerHTML from a tag using the following code:
theurl = "http://na.op.gg/summoner/userName=Darshan"
thepage = urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")
rank = soup.findAll('span',{"class":"tierRank"})
但是我却得到了[< span class="tierRank" > Master < /span >]
.我只想显示"Master"值.
However I am getting [< span class="tierRank" > Master < /span >]
instead.What I want to show is the value "Master" only.
使用soup.get_text
代替soup.findall
不起作用.
我尝试将.text
和.string
添加到最后一行的末尾,但这也不起作用.
I tried adding .text
and .string
to the end of last line but that did not work either.
推荐答案
soup.findAll('span',{"class":"tierRank"})
返回与<span class="tierRank">
匹配的元素的列表.
soup.findAll('span',{"class":"tierRank"})
returns a list of elements that match <span class="tierRank">
.
- 您想要列表中的第一个元素.
- 您想要该元素中的
innerHtml
,可以通过decode_contents()
方法进行访问.
- You want the first element from that list.
- You want the
innerHtml
from that element, which can be accessed by thedecode_contents()
method.
一起:
rank = soup.findAll('span',{"class":"tierRank"})[0].decode_contents()
这会将"Master"存储在rank
中.
This will store "Master" in rank
.
这篇关于如何在Python中使用BeautifulSoup从标记中提取innerHTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!