本文介绍了使用BeautifulSoup提取不带标签的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的网页如下:
<p>
<strong class="offender">YOB:</strong> 1987<br/>
<strong class="offender">RACE:</strong> WHITE<br/>
<strong class="offender">GENDER:</strong> FEMALE<br/>
<strong class="offender">HEIGHT:</strong> 5'05''<br/>
<strong class="offender">WEIGHT:</strong> 118<br/>
<strong class="offender">EYE COLOR:</strong> GREEN<br/>
<strong class="offender">HAIR COLOR:</strong> BROWN<br/>
</p>
我想提取每个人的信息并获取YOB:1987
,RACE:WHITE
等...
I want to extract the info for each individual and get YOB:1987
, RACE:WHITE
, etc...
我尝试过的是:
subc = soup.find_all('p')
subc1 = subc[1]
subc2 = subc1.find_all('strong')
但这只给了我YOB:
,RACE:
等值...
But this gives me only the values of YOB:
, RACE:
, etc...
有没有一种方法可以获取YOB:1987
,RACE:WHITE
格式的数据?
Is there a way that I can get the data in YOB:1987
, RACE:WHITE
format?
推荐答案
只需遍历所有<strong>
标记并使用 next_sibling
即可获得所需的内容.像这样:
Just loop through all the <strong>
tags and use next_sibling
to get what you want. Like this:
for strong_tag in soup.find_all('strong'):
print(strong_tag.text, strong_tag.next_sibling)
演示:
from bs4 import BeautifulSoup
html = '''
<p>
<strong class="offender">YOB:</strong> 1987<br />
<strong class="offender">RACE:</strong> WHITE<br />
<strong class="offender">GENDER:</strong> FEMALE<br />
<strong class="offender">HEIGHT:</strong> 5'05''<br />
<strong class="offender">WEIGHT:</strong> 118<br />
<strong class="offender">EYE COLOR:</strong> GREEN<br />
<strong class="offender">HAIR COLOR:</strong> BROWN<br />
</p>
'''
soup = BeautifulSoup(html)
for strong_tag in soup.find_all('strong'):
print(strong_tag.text, strong_tag.next_sibling)
这给您:
YOB: 1987
RACE: WHITE
GENDER: FEMALE
HEIGHT: 5'05''
WEIGHT: 118
EYE COLOR: GREEN
HAIR COLOR: BROWN
这篇关于使用BeautifulSoup提取不带标签的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!