本文介绍了如何在猪中使用XPath提取XML属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想提取属性形成用隐语的XML。
I wanted to extract the attributes form an xml using Pig Latin.
这是XML文件的样本
<CATALOG>
<BOOK>
<TITLE test="test1">Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>
我用这个脚本,但它没有工作:
I used this script but it didn't work:
REGISTER ./piggybank.jar
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
A = LOAD './books.xml' using org.apache.pig.piggybank.storage.XMLLoader('BOOK') as (x:chararray);
B = FOREACH A GENERATE XPath(x, 'BOOK/TITLE/@test'), XPath(x, 'BOOK/PRICE');
dump B;
输出是:
(,24.90)
我希望有人能帮助我与此有关。
谢谢你。
I hope someone can help me with this.Thanks.
推荐答案
有2虫子在扑满的XPath类:
There are 2 bugs in piggybank's XPath class:
-
该ignoreNamespace逻辑断裂搜索XML属性
该ignoreNamepace参数默认为true,并且不能被覆盖
The ignoreNamepace parameter is defaulted to true and cannot be overwrittenhttps://issues.apache.org/jira/browse/PIG-4752
下面是一个使用XPathAll我的解决方法:
Here is my workaround using XPathAll:
XPathAll(x, 'BOOK/TITLE/@test', true, false).$0 as (test:chararray)
此外,如果你还需要忽略命名空间:
Also if you still need to ignore namespaces:
XPathAll(x, '//*[local-name()=\'BOOK\']//*[local-name()=\'TITLE\']/@test', true, false).$0 as (test:chararray)
这篇关于如何在猪中使用XPath提取XML属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!