我正在解析HTML文档,我想获取特定的标签并将其与其他标签分开使用,但是我正在解决诸如标签内标签之类的问题。有人可以建议在此仅获取标签内容而不包含标签内容吗?

<p> I want this text <b> I want to parse this separately </b> I also want this text </p>

最佳答案

您可以使用NavigableString

from bs4 import BeautifulSoup, NavigableString

html = '''<p> I want this text1 <b> I want to parse this separately1 </b> I also want this text1 </p>
<p> I want this text2 <b> I want to parse this separately2 </b> I also want this text2 </p>'''
soup = BeautifulSoup(html, 'html.parser')
for p in soup.find_all('p'):
    outer_text = ' '.join([x.strip() for x in p if isinstance(x, NavigableString)])
    print(outer_text)
    inner_text = p.b.text.strip()
    print(inner_text)


输出:


  我想要这个文字1我也想要这个文字1
  我想分开解析1
  我想要这个文字2我也想要这个文字2
  我想分开解析2

关于python - 仅从标签BeautifulSoup Python获取直接文本,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48767044/

10-09 03:06