一种解决方法是将该函数定义为 Spark 作业中的临时函数,其中 jar 路径指向本地边缘节点路径.然后在同一个 Spark 作业中调用该函数.CREATE TEMPORARY FUNCTION functionName as 'com.test.HiveUDF' USING JAR '/user/home/dir1/functions.jar'I have previously registered a UDF with hive. It is permanent not TEMPORARY. It works in beeline.CREATE FUNCTION normaliseURL AS 'com.example.hive.udfs.NormaliseURL' USING JAR 'hdfs://udfs/hive-udfs.jar';I have spark configured to use the hive metastore. The config is working as I can query hive tables. I can see the UDF;In [9]: spark.sql('describe function normaliseURL').show(truncate=False)+-------------------------------------------+|function_desc |+-------------------------------------------+|Function: default.normaliseURL ||Class: com.example.hive.udfs.NormaliseURL ||Usage: N/A. |+-------------------------------------------+However I cannot use the UDF in a sql statement;spark.sql('SELECT normaliseURL("value")')AnalysisException: "Undefined function: 'default.normaliseURL'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7"If I attempt to register the UDF with spark (bypassing the metastore) it fails to register it, suggesting that it does already exist.In [12]: spark.sql("create function normaliseURL as 'com.example.hive.udfs.NormaliseURL'")AnalysisException: "Function 'default.normaliseURL' already exists in database 'default';"I'm using Spark 2.0, hive metastore 1.1.0. The UDF is scala, my spark driver code is python.I'm stumped.Am I correct in my assumption that Spark can utilise metastore-defined permanent UDFs?Am I creating the function correctly in hive? 解决方案 Issue is Spark 2.0 is not able to execute the functions whose JARs are located on HDFS.Spark SQL: Thriftserver unable to run a registered Hive UDTFOne workaround is to define the function as a temporary function in Spark job with jar path pointing to a local edge-node path. Then call the function in same Spark job.CREATE TEMPORARY FUNCTION functionName as 'com.test.HiveUDF' USING JAR '/user/home/dir1/functions.jar' 这篇关于无法使用来自 Spark SQL 的现有 Hive 永久 UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-03 06:34