本文介绍了Hive 执行钩子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 Apache Hive 中挂钩一个自定义执行挂钩.如果有人知道怎么做,请告诉我.

I am in need to hook a custom execution hook in Apache Hive. Please let me know if somebody know how to do it.

我当前使用的环境如下:

The current environment I am using is given below:

Hadoop:Cloudera 版本 4.1.2操作系统:Centos

Hadoop : Cloudera version 4.1.2Operating system : Centos

谢谢,阿伦

推荐答案

根据您要在哪个阶段注入自定义代码,有多种类型的钩子:

There are several types of hooks depending on at which stage you want to inject your custom code:

  • 驱动程序运行挂钩(前/后)
  • 语义分析器挂钩(前/后)
  • 执行挂钩(前/失败/后)
  • 客户统计信息发布者

如果您运行脚本,处理流程如下所示:

If you run a script the processing flow looks like as follows:

  1. Driver.run() 接受命令
  2. HiveDriverRunHook.preDriverRun()
    (HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  3. Driver.compile() 开始处理命令:创建抽象语法树
  4. AbstractSemanticAnalyzerHook.preAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  5. 语义分析
  6. AbstractSemanticAnalyzerHook.postAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  7. 创建并验证查询计划(物理计划)
  8. Driver.execute() :准备运行作业
  9. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.PREEXECHOOKS)
  10. ExecDriver.execute() 运行所有作业
  11. 对于每个 HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL 间隔的每个作业:
    调用ClientStatsPublisher.run()发布统计信息
    (HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
    如果任务失败:ExecuteWithHookContext.run()
    (HiveConf.ConfVars.ONFAILUREHOOKS)
  12. 完成所有任务
  13. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.POSTEXECHOOKS)
  14. 返回结果之前 HiveDriverRunHook.postDriverRun()
    ( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  15. 返回结果.
  1. Driver.run() takes the command
  2. HiveDriverRunHook.preDriverRun()
    (HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  3. Driver.compile() starts processing the command: creates the abstract syntax tree
  4. AbstractSemanticAnalyzerHook.preAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  5. Semantic analysis
  6. AbstractSemanticAnalyzerHook.postAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  7. Create and validate the query plan (physical plan)
  8. Driver.execute() : ready to run the jobs
  9. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.PREEXECHOOKS)
  10. ExecDriver.execute() runs all the jobs
  11. For each job at every HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL interval:
    ClientStatsPublisher.run() is called to publish statistics
    (HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
    If a task fails: ExecuteWithHookContext.run()
    (HiveConf.ConfVars.ONFAILUREHOOKS)
  12. Finish all the tasks
  13. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.POSTEXECHOOKS)
  14. Before returning the result HiveDriverRunHook.postDriverRun()
    ( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  15. Return the result.

对于每个钩子,我都指出了您必须实现的接口.在括号中有相应的conf.支柱.您必须设置的密钥才能注册脚本开头的类.例如:设置 PreExecution 挂钩(工作流的第 9 阶段)

For each of the hooks I indicated the interfaces you have to implement. In the bracketsthere's the corresponding conf. prop. key you have to set in order to register theclass at the beginning of the script.E.g: setting the PreExecution hook (9th stage of the workflow)

HiveConf.ConfVars.PREEXECHOOKS -> hive.exec.pre.hooks :
set hive.exec.pre.hooks=com.example.MyPreHook;

不幸的是,这些功能并未真正记录在案,但您可以随时查看 Driver 类,查看钩子的求值顺序.

Unfortunately these features aren't really documented, but you can always look into the Driver class to see the evaluation order of the hooks.

备注:我假设这里是 Hive 0.11.0,我不认为 Cloudera 发行版不同(太多)

Remark: I assumed here Hive 0.11.0, I don't think that the Cloudera distributiondiffers (too much)

这篇关于Hive 执行钩子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-29 05:12