Hive 执行钩子

本文介绍了Hive 执行钩子的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要在 Apache Hive 中挂钩一个自定义执行挂钩.如果有人知道怎么做，请告诉我.

I am in need to hook a custom execution hook in Apache Hive. Please let me know if somebody know how to do it.

我当前使用的环境如下:

The current environment I am using is given below:

Hadoop:Cloudera 版本 4.1.2操作系统:Centos

Hadoop : Cloudera version 4.1.2Operating system : Centos

谢谢，阿伦

推荐答案

根据您要在哪个阶段注入自定义代码，有多种类型的钩子:

There are several types of hooks depending on at which stage you want to inject your custom code:

驱动程序运行挂钩(前/后)
语义分析器挂钩(前/后)
执行挂钩(前/失败/后)
客户统计信息发布者

如果您运行脚本，处理流程如下所示:

If you run a script the processing flow looks like as follows:

Driver.run() 接受命令
HiveDriverRunHook.preDriverRun()
(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Driver.compile() 开始处理命令:创建抽象语法树
AbstractSemanticAnalyzerHook.preAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
语义分析
AbstractSemanticAnalyzerHook.postAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
创建并验证查询计划(物理计划)
Driver.execute() :准备运行作业
ExecuteWithHookContext.run()
(HiveConf.ConfVars.PREEXECHOOKS)
ExecDriver.execute() 运行所有作业
对于每个 HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL 间隔的每个作业:
调用ClientStatsPublisher.run()发布统计信息
(HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
如果任务失败:ExecuteWithHookContext.run()
(HiveConf.ConfVars.ONFAILUREHOOKS)
完成所有任务
ExecuteWithHookContext.run()
(HiveConf.ConfVars.POSTEXECHOOKS)
返回结果之前 HiveDriverRunHook.postDriverRun()
( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
返回结果.

Driver.run() takes the command
HiveDriverRunHook.preDriverRun()
(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Driver.compile() starts processing the command: creates the abstract syntax tree
AbstractSemanticAnalyzerHook.preAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
Semantic analysis
AbstractSemanticAnalyzerHook.postAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
Create and validate the query plan (physical plan)
Driver.execute() : ready to run the jobs
ExecuteWithHookContext.run()
(HiveConf.ConfVars.PREEXECHOOKS)
ExecDriver.execute() runs all the jobs
For each job at every HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL interval:
ClientStatsPublisher.run() is called to publish statistics
(HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
If a task fails: ExecuteWithHookContext.run()
(HiveConf.ConfVars.ONFAILUREHOOKS)
Finish all the tasks
ExecuteWithHookContext.run()
(HiveConf.ConfVars.POSTEXECHOOKS)
Before returning the result HiveDriverRunHook.postDriverRun()
( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Return the result.

对于每个钩子，我都指出了您必须实现的接口.在括号中有相应的conf.支柱.您必须设置的密钥才能注册脚本开头的类.例如:设置 PreExecution 挂钩(工作流的第 9 阶段)

For each of the hooks I indicated the interfaces you have to implement. In the bracketsthere's the corresponding conf. prop. key you have to set in order to register theclass at the beginning of the script.E.g: setting the PreExecution hook (9th stage of the workflow)

HiveConf.ConfVars.PREEXECHOOKS -> hive.exec.pre.hooks :
set hive.exec.pre.hooks=com.example.MyPreHook;

不幸的是，这些功能并未真正记录在案，但您可以随时查看 Driver 类，查看钩子的求值顺序.

Unfortunately these features aren't really documented, but you can always look into the Driver class to see the evaluation order of the hooks.

备注:我假设这里是 Hive 0.11.0，我不认为 Cloudera 发行版不同(太多)

Remark: I assumed here Hive 0.11.0, I don't think that the Cloudera distributiondiffers (too much)

这篇关于Hive 执行钩子的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！