我正在使用Hive解析xml文件,因为我正在使用 hivexmlserde
当我编写代码并执行代码时,出现错误。

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: The number of XPath expressions does not much the number of columns

但是我的列号和xpath表达式是相同的。

下面是我的代码:
add jar /home/cloudera/hivexmlserde-1.0.5.3.jar;
CREATE EXTERNAL TABLE INFO(
statusCode string,
title string,
startTime string,
endTime string,
frequencyValue string,
frequencyUnits string,
strengthValue string,
strengthUnits string,
routecode string,
routecodeSystem string,
routedisplayName string,
routecodesystemName string,
ugcode string,
uname string,
ucodeSystem string,
codeSystemName string,
ageForm string,
tr_code string,
tr_description string,
tr_codesystem string,
tr_codesystemname string
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.statusCode"="Document/xxx/statusCode/text()",
"column.xpath.title"="Document/xxx/code/code/text()",
"column.xpath.startTime"="Document/xxx/startTime/text()",
"column.xpath.endTime"="Document/xxx/endTime/text()",
"column.xpath.frequencyValue"="Document/xxx/frequencyValue/text()",
"column.xpath.frequencyUnits"="Document/xxx/frequencyUnits/text()",
"column.xpath.strengthValue"="Document/xxx/strengthValue/text()",
"column.xpath.strengthUnits"="Document/xxx/strengthUnits/text()",
"column.xpath.routecode"="Document/xxx/entryInfo/routeCode/code/text()",
"column.xpath.routecodeSystem"="Document/xxx/entryInfo/routeCode/codeSystem/text()",
"column.xpath.routedisplayName"="Document/xxx/entryInfo/routeCode/displayName/text()",
"column.xpath.routecodesystemName"="Document/xxx/entryInfo/routeCode/codeSystemName/text()",
"column.xpath.ugcode"="Document/xxx/entryInfo/productCode/code/text()",
"column.xpath.ugname"="Document/xxx/entryInfo/productCode/displayName/text()",
"column.xpath.ugcodeSystem"="Document/xxx/entryInfo/productCode/codeSystem/text()",
"column.xpath.ugcodeSystemName"="Document/xxx/entryInfo/productCode/codeSystemName/text()",
"column.xpath.dosageForm"="Document/xxx/entryInfo/ageForm/displayName/text()",
"column.xpath.tr_code"="Document/xxx/entryInfo/productCode/translation/code/text()",
"column.xpath.tr_description"="Document/xxx/entryInfo/productCode/translation/displayName/text()",
"column.xpath.tr_codesystem"="Document/xxx/entryInfo/productCode/translation/codeSystem/text()",
"column.xpath.tr_codesystem"="Document/xxx/entryInfo/productCode/translation/codeSystemName/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<Document",
"xmlinput.end"="</Document>");

最佳答案

经过一点代码挖掘,我发现了问题。我遇到了这个问题,因为我做了2个xpath列名。



在SERDEPROPERTIES中重复两次。我将其更改为代码系统名称,而不是开始为我工作。

关于xml - 在Hive中解析xml时出错,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41438880/

10-11 23:36