我想提取我在下面的图片中引用的参数。。。
我试过的是:
url='http://site.ir'
content=requests.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/????')]
这不在标记范围内,也不在标记br内。
图片:
更新
假设我有:
out=""" <div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT
<br></br>
</div>
</div> <div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT(1)
<br></br>
</div>
</div>
imagine I have: out=""" <div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT(2)
<br></br>
</div>
</div> <div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT(3)
<br></br>
</div>
</div> """"""
最佳答案
另一个选择是让以下内容成为span
文本兄弟:
//div[@class="grouptext"]/span[1]/following-sibling::text()
演示:
from lxml import html
data = """
<div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT
<br></br>
</div>
</div>
"""
tree = html.fromstring(data)
print tree.xpath('//div[@class="grouptext"]/span[1]/following-sibling::text()')[0].strip()
印刷品:
WHAT I WANT
对于更新的示例,以下是对我有用的:
for result in tree.xpath('//div[@class="grouptext"]/span/following-sibling::text()'):
print result.strip()
印刷品:
WHAT I WANT
WHAT I WANT(1)
WHAT I WANT(2)
WHAT I WANT(3)