本文介绍了“太多的获取失败"使用 Hive 时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对 3 个节点的 hadoop 集群运行 hive 查询.我收到一条错误消息,显示获取失败太多".我的蜂巢查询是:

I'm running a hive query against a hadoop cluster of 3 nodes. And I am getting an error which says "Too many fetch failures". My hive query is:

  insert overwrite table tablename1 partition(namep)
  select id,name,substring(name,5,2) as namep from tablename2;

这是我试图运行的查询.我想要做的就是将数据从 tablename2 传输到 tablename1.任何帮助表示赞赏.

that's the query im trying to run. All i want to do is transfer data from tablename2 to tablename1. Any help is appreciated.

推荐答案

这可能是由各种 hadoop 配置问题引起的.这里有几个要特别注意的:

This can be caused by various hadoop configuration issues. Here a couple to look for in particular:

  • DNS 问题:检查您的 /etc/hosts
  • mapper 端没有足够的 http 线程供 reducer 使用

一些建议的修复(来自 Cloudera 故障排除)

Some suggested fixes (from Cloudera troubleshooting)

  • 设置 mapred.reduce.slowstart.completed.maps = 0.80
  • tasktracker.http.threads = 80
  • mapred.reduce.parallel.copies = sqrt(节点数)但无论如何>= 10

这里是故障排除链接,了解更多详情

Here is link to troubleshooting for more details

http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-cloudera

2020 年更新 事情发生了很大的变化,AWS 主要占据了主导地位.这是一些故障排除方法

Update for 2020 Things have changed a lot and AWS mostly rules the roost. Here is some troubleshooting for it

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-error-resource-1.html

太多的获取失败PDF点燃太多的获取失败"的存在或错误读取任务输出"步骤或任务尝试日志中的错误消息表明正在运行的任务依赖于另一个任务的输出.这通常发生在一个 reduce 任务排队执行并且需要一个或多个 map 任务的输出并且输出尚不可用时.

输出不可用的原因有多种:

There are several reasons the output may not be available:

前提任务仍在处理中.这通常是一个地图任务.

The prerequisite task is still processing. This is often a map task.

如果数据位于不同的实例上,数据可能会因网络连接不良而无法使用.

The data may be unavailable due to poor network connectivity if the data is located on a different instance.

如果使用 HDFS 来检索输出,则 HDFS 可能存在问题.

If HDFS is used to retrieve the output, there may be an issue with HDFS.

此错误的最常见原因是前一个任务仍在处理中.如果在 reduce 任务第一次尝试运行时发生错误,这种情况尤其可能发生.您可以通过查看返回错误的集群步骤的 syslog 日志来检查是否属于这种情况.如果 syslog 显示 map 和 reduce 任务都在进行中,则表示 reduce 阶段已经开始,而还有一些 map 任务尚未完成.

The most common cause of this error is that the previous task is still processing. This is especially likely if the errors are occurring when the reduce tasks are first trying to run. You can check whether this is the case by reviewing the syslog log for the cluster step that is returning the error. If the syslog shows both map and reduce tasks making progress, this indicates that the reduce phase has started while there are map tasks that have not yet completed.

要在日志中查找的一件事是地图进度百分比,该百分比达到 100% 然后回落到较低的值.当地图百分比为 100% 时,这并不意味着所有地图任务都已完成.它只是意味着 Hadoop 正在执行所有地图任务.如果此值回落到 100% 以下,则意味着映射任务失败,并且根据配置,Hadoop 可能会尝试重新安排任务.如果日志中的地图百分比保持在 100%,请查看 CloudWatch 指标,特别是 RunningMapTasks,以检查地图任务是否仍在处理中.您还可以使用主节点上的 Hadoop Web 界面找到此信息.

One thing to look for in the logs is a map progress percentage that goes to 100% and then drops back to a lower value. When the map percentage is at 100%, this does not mean that all map tasks are completed. It simply means that Hadoop is executing all the map tasks. If this value drops back below 100%, it means that a map task has failed and, depending on the configuration, Hadoop may try to reschedule the task. If the map percentage stays at 100% in the logs, look at the CloudWatch metrics, specifically RunningMapTasks, to check whether the map task is still processing. You can also find this information using the Hadoop web interface on the master node.

如果您遇到此问题,可以尝试以下几种方法:

If you are seeing this issue, there are several things you can try:

指示reduce阶段在开始前等待更长时间.您可以通过将 Hadoop 配置设置 mapred.reduce.slowstart.completed.maps 更改为更长的时间来完成此操作.有关详细信息,请参阅创建引导操作以安装其他软件.

Instruct the reduce phase to wait longer before starting. You can do this by altering the Hadoop configuration setting mapred.reduce.slowstart.completed.maps to a longer time. For more information, see Create Bootstrap Actions to Install Additional Software.

将reducer 数量与集群的总reducer 能力相匹配.您可以通过调整作业的 Hadoop 配置设置 mapred.reduce.tasks 来完成此操作.

Match the reducer count to the total reducer capability of the cluster. You do this by adjusting the Hadoop configuration setting mapred.reduce.tasks for the job.

使用组合器类代码来最小化需要获取的输出量.

Use a combiner class code to minimize the amount of outputs that need to be fetched.

检查 Amazon EC2 服务是否存在影响集群网络性能的问题.您可以使用服务运行状况仪表板执行此操作.

Check that there are no issues with the Amazon EC2 service that are affecting the network performance of the cluster. You can do this using the Service Health Dashboard.

检查集群中实例的 CPU 和内存资源,以确保您的数据处理不会使您的节点资源不堪重负.有关详细信息,请参阅配置集群硬件和网络.

Review the CPU and memory resources of the instances in your cluster to make sure that your data processing is not overwhelming the resources of your nodes. For more information, see Configure Cluster Hardware and Networking.

检查您的 Amazon EMR 集群中使用的 Amazon 系统映像 (AMI) 的版本.如果版本是 2.3.0 到 2.4.4(含),请更新到更高版本.指定范围内的 AMI 版本使用的 Jetty 版本可能无法从映射阶段交付输出.当 reducer 无法从 map 阶段获取输出时,就会发生 fetch 错误.

Check the version of the Amazon Machine Image (AMI) used in your Amazon EMR cluster. If the version is 2.3.0 through 2.4.4 inclusive, update to a later version. AMI versions in the specified range use a version of Jetty that may fail to deliver output from the map phase. The fetch error occurs when the reducers cannot obtain output from the map phase.

Jetty 是一个开源 HTTP 服务器,用于在 Hadoop 集群内进行机器对机器的通信

Jetty is an open-source HTTP server that is used for machine to machine communications within a Hadoop cluster

这篇关于“太多的获取失败"使用 Hive 时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 16:07