问题描述
这是lxml,它另存为sample.html.
Here is the lxml, it's saved as sample.html.
<html>
<body>
<div class ="ecopyramid">
<ul id ="producers">
<li class ="producerlist">
<div class ="name">A1</div>
<div class ="number">100000</div>
</li>
<li class ="producerlist">
<div class ="name">B1</div>
<div class ="number">100000</div>
</li>
</ul>
<ul id ="primaryconsumers">
<li class ="primaryconsumerlist">
<div class ="name">A2</div>
<div class ="number">1000</div>
</li>
<li class ="primaryconsumerlist">
<div class ="name">B2</div>
<div class ="number">2000</div>
</li>
</ul>
<ul id ="secondaryconsumers">
<li class ="secondaryconsumerlist">
<div class ="name">A3</div>
<div class ="number">100</div>
</li>
<li class ="secondaryconsumerlist">
<div class ="name">B3</div>
<div class ="number">98</div>
</li>
</ul>
<ul id ="tertiaryconsumers">
<li class ="tertiaryconsumerlist">
<div class ="name">A4</div>
<div class ="number">80</div>
</li>
<li class ="tertiaryconsumerlist">
<div class ="name">B4</div>
<div class ="number">50</div>
</li>
</ul>
</body>
</html>
这是在上面的sample.html中导航的代码:
Here is the code to navigate through the sample.html above:
from bs4 import BeautifulSoup
with open("sample.html", "r") as sample_pyramid:
soup=BeautifulSoup(sample_pyramid, "lxml")
soup_object = soup.find("ul", id="secondaryconsumers")
print soup_object.li.div.string
因此在此代码中,我能够首先通过标签"ul"和id"secondaryconsumers"指定文本"A3"的父位置,然后在打印命令中通过".li.div"进一步指定".string"后缀并输出所需的文本"A3".我的问题如下:
So in this code I am able to first specify the parent location of the text "A3" first by the tag "ul" and id "secondaryconsumers", then in the print command I specify further by the ".li.div.string" suffix and output the desired text of "A3". My questions are as follows:
1)在此示例中,我该如何编码才能调用/打印文本"B3"?
1) How do I code in order to call/print the text "B3" in this example?
2)在此示例中,我该如何编码才能调用/打印文本"98"(在"B3"下方)?
2) How do I code in order to call/print the text "98" (below "B3") in this example?
我尝试了很多事情都没有成功,我可以通过导航调用第一个文本对象,但是不能调用共享标签中的第二个文本对象.
I have tried many things with no success, I am able to call the first text object through the navigation, but not the second text object within the shared tags.
有什么想法吗?
推荐答案
您可以使用 CSS选择器以获取名称和数字:
You can use CSS selectors to get names and numbers:
names = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.name')
numbers = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.number')
print [name.text for name in names]
print [number.text for number in numbers]
打印:
[u'A3', u'B3']
[u'100', u'98']
注释中后续问题的示例代码:
Example code for the follow-up question in comments:
from bs4 import BeautifulSoup
data = """
<div class="span9">
<table class="result-data table" border="0">
<tbody>
<tr class="result-item highlighting">
<td class="result-category" scope="row">Name:</td>
<td class="result-value-bold" colspan="4" itemprop="item">
Robin Hood
</td>
</tr>
</tbody>
</table>
</div>
"""
soup = BeautifulSoup(data)
print soup.find('td', class_="result-value-bold").get_text(strip=True)
打印Robin Hood
.
或者,或者首先找到父table
和tr
:
Or, alternatively first find parent table
and tr
:
table = soup.find('table', class_='result-data')
tr = table.find('tr', class_='result-item')
print tr.find('td', class_="result-value-bold").get_text(strip=True)
这篇关于使用BeautifulSoup导航到第二个字符串文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!