python - 在Python中使用BeautifulSoup中的“renderContents”

环境：Python 2.7 + BeautifulSoup 4.3.2

这是原始HTML代码的一部分：

<dl><dt>Newest Item:</dt><dd><span class="NewsTime" title="Southeast in 2007">SE, 2007</span></dd></dl>

我要讲的是“ SE，2007”。

我得出的结论是：

from bs4 import BeautifulSoup
import re
import urllib2

url = "http://sample.com"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

NEWS = soup.find_all("span",class_="NewsTime", limit=1) #because there are 2 such the same

for LA in NEWS:
    print LA.renderContents()

有用。但是当我将最后两行更改为：

print NEWS.renderContents()

为什么？另外，我对原始HTML代码的理解正确吗？

<dl> is the father
<dt> and <dd> are the father’s son
<span> is <dd>’s son

最佳答案

就BeautifulSoup而言，NEWS是一个ResultSet。集合中只有一个结果没关系-它仍然是ResultSet，并且您不能在ResultSet上调用renderContents（）。

find_all（）函数始终返回一个bs4.element.ResultSet，其中包含零个或多个bs4.element.Tag类型的元素-您只能在Tag对象上调用renderContents（）。

在这种情况下，要保存for循环，可以在第一行使用零索引：

NEWS = soup.find_all("span",class_="NewsTime", limit=1)[0]

print(NEWS.renderContents())

关于python - 在Python中使用BeautifulSoup中的“renderContents”，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/21251055/