我正在使用EMR 5.19 Hive 2.3.3,但Nullif不能从Java String转换为Hadoop Text或反之亦然。来源是来自AWS的CloudTrail Serde,看起来写得很扎实。从错误消息中可以看到,问题似乎出在内置的NULLIF UDF中:
我正在测试正则表达式提取的结果是否为空字符串,如果是,我想要一个空值,因此我的列看起来有点像NULLIF(REGEXP_EXTRACT(key,'([^\/]+)(\/\d+)?(\/.*)', 1), '') AS key_prefix
,但是出现以下错误:
2020-02-11 11:06:34,034 INFO [IPC Server handler 26 on 43627] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1574116917806_1754132_r_000008_3: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:445)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:93)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:820)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:834)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:837)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:938)
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:264)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:196)
... 7 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.comparePrimitiveObjects(PrimitiveObjectInspectorUtils.java:421)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:93)
最佳答案
也许不直接回答您的问题,但希望这会有所帮助。
如果正则表达式不匹配,regexp_extract
将返回空字符串''
,仅当源字符串为null
时,它才可以返回null
。因此,在此处使用NULLIF看起来不正确
使用双反斜杠来屏蔽Hive regexp中的特殊字符,例如\\d
。/
-不是特殊字符,不需要转义/屏蔽。
我建议像这样的宏:
CREATE TEMPORARY MACRO normalize_null(s string) CASE WHEN s!='' THEN s END;
它将空字符串转换为null,NULL和其他所有内容。
关于java - Hive中的NULLIF在某些版本中是否存在一些已知的实现问题?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60161108/