本文介绍了如何使用 XPath 选择非空段落?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要抓取的网页具有相似的结构.每个人都有一个作为问题的段落和一个作为答案的段落.我想抓取每个问题和答案并将它们存储在两个项目中

The webpages I want to scrape have similar structures. Each has a paragraph which is a question and a paragraph which is an answer. I want to scrape each question and answer and store them in two items

问题是在某些页面上,问题和答案分别是//xxx/p[1]//xxx/p[2],但在其他页面上,//xxx/p[1] 是一个没有任何文本的空段落,作为一个额外的空间.对于这些页面,//xxx/p[1] 不会给我想要的.

The problem is that on some pages, the question and the answer are respectively //xxx/p[1] and //xxx/p[2], but on other pages, the //xxx/p[1] is an empty paragraph without any text, which serves as an extra space. For these pages, //xxx/p[1] won't give me what I desire.

那么有没有一种XPath表达式可以选择一个节点下的非空段落?

So is there an XPath expression that can select non-empty paragraphs under one node?

推荐答案

如果根本没有文字,你可以使用

If there's no text at all, you can use

//p[.//text()]

选择带有文本的段落.如果空"段落包含空格(例如换行符),则必须先规范化空格:

to select paragraphs with text. If the "empty" paragraphs contain whitespace (e.g. newlines), you have to normalize the whitespace first:

//p[normalize-space(.//text())]

可以缩短为

//p[normalize-space()]

这篇关于如何使用 XPath 选择非空段落?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-21 09:14