问题描述
我要抓取的网页具有相似的结构.每个人都有一个作为问题的段落和一个作为答案的段落.我想抓取每个问题和答案并将它们存储在两个项目中
The webpages I want to scrape have similar structures. Each has a paragraph which is a question and a paragraph which is an answer. I want to scrape each question and answer and store them in two items
问题是在某些页面上,问题和答案分别是//xxx/p[1]
和//xxx/p[2]
,但在其他页面上,//xxx/p[1]
是一个没有任何文本的空段落,作为一个额外的空间.对于这些页面,//xxx/p[1]
不会给我想要的.
The problem is that on some pages, the question and the answer are respectively //xxx/p[1]
and //xxx/p[2]
, but on other pages, the //xxx/p[1]
is an empty paragraph without any text, which serves as an extra space. For these pages, //xxx/p[1]
won't give me what I desire.
那么有没有一种XPath表达式可以选择一个节点下的非空段落?
So is there an XPath expression that can select non-empty paragraphs under one node?
推荐答案
如果根本没有文字,你可以使用
If there's no text at all, you can use
//p[.//text()]
选择带有文本的段落.如果空"段落包含空格(例如换行符),则必须先规范化空格:
to select paragraphs with text. If the "empty" paragraphs contain whitespace (e.g. newlines), you have to normalize the whitespace first:
//p[normalize-space(.//text())]
可以缩短为
//p[normalize-space()]
这篇关于如何使用 XPath 选择非空段落?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!