我正在使用EMR 5.19 Hive 2.3.3,但Nullif不能从Java String转换为Hadoop Text或反之亦然。来源是来自AWS的CloudTrail Serde,看起来写得很扎实。从错误消息中可以看到,问题似乎出在内置的NULLIF UDF中:

我正在测试正则表达式提取的结果是否为空字符串,如果是,我想要一个空值,因此我的列看起来有点像NULLIF(REGEXP_EXTRACT(key,'([^\/]+)(\/\d+)?(\/.*)', 1), '') AS key_prefix,但是出现以下错误:

2020-02-11 11:06:34,034 INFO [IPC Server handler 26 on 43627] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1574116917806_1754132_r_000008_3: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:445)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:93)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:820)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:834)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:837)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:938)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:264)
    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:196)
    ... 7 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.hadoop.io.Text
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.comparePrimitiveObjects(PrimitiveObjectInspectorUtils.java:421)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:93)

最佳答案

也许不直接回答您的问题,但希望这会有所帮助。


如果正则表达式不匹配,regexp_extract将返回空字符串'',仅当源字符串为null时,它才可以返回null。因此,在此处使用NULLIF看起来不正确
使用双反斜杠来屏蔽Hive regexp中的特殊字符,例如\\d
/-不是特殊字符,不需要转义/屏蔽。


我建议像这样的宏:

CREATE TEMPORARY MACRO normalize_null(s string) CASE WHEN s!='' THEN s END;


它将空字符串转换为null,NULL和其他所有内容。

关于java - Hive中的NULLIF在某些版本中是否存在一些已知的实现问题?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60161108/

10-10 14:01