问题描述
我正在使用 R 和包 XML 解析巨大的 XML 文件.作为数据处理的一部分,我现在需要在一长串节点中,每个节点有多少特定名称的子节点(节点数可以超过 20.000)
Using R and the package XML I'm parsing huge XML files. As part of the data handling I need to now, in a long list of nodes, how many children of specific name each node has (the number of nodes can exceed 20.000)
我目前的做法是:
nChildrenWithName <- xpathSApply(doc, path="/path/to/node/*", namespaces=ns, xmlName) == 'NAME'
nChildren <- xpathSApply(doc, path="/path/to/node", namespaces=ns, fun=xmlSize)
nID <- sapply(split(nChildrenWithName, rep(seq(along=nChildren), nChildren)), sum)
这是我所能得到的矢量化.我仍然觉得这可以通过使用正确的 XPATH 表达式在单个调用中实现.不过,我对 XPATH 的了解有限,所以如果有人知道该怎么做,我将不胜感激……
Which is as vectorized as I can get it. Still I have the feeling that this can be achieved in a single call using the correct XPATH expression. My knowledge on XPATH is limited though, so if anyone knows how to do it I would be grateful for some insight...
最好的托马斯
推荐答案
如果我理解正确的问题,有这样的XML:
If I understand correctly the question, there is a XML like:
<path>
<to>
<node>
<NAME>A</NAME>
<NAME>B</NAME>
<NAME>C</NAME>
</node>
<node>
<NAME>X</NAME>
<NAME>Y</NAME>
</node>
</to>
<to>
<node>
<NAME>AA</NAME>
<NAME>BB</NAME>
<NAME>CC</NAME>
</node>
</to>
</path>
需要的是每个 node
下的 NAME
元素的数量 1 - 所以在上面的示例中为 3、2、3.
and what is wanted is the number of NAME
elements under each node
one - so 3, 2, 3 in the example above.
这在 XPath 1.0 中是不可能的:表达式可以返回节点列表或单个值 - 但不能返回计算值列表.
This is not possible in XPath 1.0: an expression can return a list of nodes or a single value - but not a list of computed values.
使用 XPath 2.0 你可以编写:
Using XPath 2.0 you can write:
for $node in /path/to/node return count($node/NAME)
或者简单地说:
/path/to/node/count(NAME)
(您可以在这里对其进行测试)
(You can test them here)
这篇关于使用 XML 和 R 有效地获取具有特定名称的孩子的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!