本文介绍了为什么xpath再次找到排除的节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑此页面:

<n1 class="a">
  1
</n1>
<n1 class="b">
  <b>bold</b>
  2
</n1>

如果我首先使用class="a"选择第一个n1,则应该排除第二个n1,的确如此:

If I first select the first n1 using class="a", I should be excluding the second n1, and indeed this appears true:

library(rvest)
b_nodes = read_html('<n1 class="a">1</n1>
<n1 class="b"><b>bold</b>2</n1>') %>%
  html_nodes(xpath = '//n1[@class="b"]')
b_nodes
# {xml_nodeset (1)}
# [1] <n1 class="b"><b>bold</b>2</n1>

但是,如果我们现在使用此子集"页面:

However if we now use this "subsetted" page:

b_nodes %>% html_nodes(xpath = '//n1')
# {xml_nodeset (2)}
# [1] <n1 class="a">1</n1>
# [2] <n1 class="b"><b>bold</b>2</n1>

1节点如何重新发现"的??

How did the 1 node get "re-discovered"??

注意:我知道如何通过两个单独的xpath获得想要的东西.这是一个关于子集"为什么没有按预期工作的概念性问题.我的理解是b_nodes应该完全排除了第一个节点-b_nodes对象甚至不应该知道该节点存在.

Note: I know how to get what I want with two separate xpaths. This is a conceptual question about why the "subsetting" didn't work as expected. My understanding was that b_nodes should have excluded the first node altogether -- the b_nodes object shouldn't even know that node exists.

推荐答案

html_nodes(xpath = '//n1')

///descendant-or-self::n1的缩写,当前节点是整个文档

// is short for /descendant-or-self::n1, the current node is the whole document

将其更改为.//n1.表示当前节点为您之前选择的内容

change it to .//n1, . means the current node is what you selected before

这篇关于为什么xpath再次找到排除的节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 13:02