hadoop - 将数据从临时表移至登台表时，获取GC开销限制超出错误

我们有一个1 gb CSV文件，我们正在尝试将其加载到配置单元表中。首先我们使用下面的配置单元查询将数据移动到临时表中，该文件包含656列。

use ${hiveconf:database_name};
set table_name = table_name;
LOAD DATA LOCAL INPATH "${hiveconf:path}" OVERWRITE INTO TABLE ${hiveconf:table_name};

然后，使用以下查询将数据从tmp表移至登台表。

 use ${hiveconf:database_name};
SET mapred.job.queue.name=root.dev;
set hive.exec.max.dynamic.partitions.pernode = 500;
SET hive.variable.substitute.depth=100;
SET PATTERN='\\^';
SET REPLACEMENT='';
INSERT OVERWRITE TABLE STAGING_TABLE partition(FILE_NAME="${hiveconf:PARTITION_BY}")
                                        SELECT
                                        COLUMN1,
                                        COLUMN2,
                                        ..
                                        COLUMN656
                                        FROM TEMP_TABLE;

在执行上述脚本时，出现以下错误。



Logging initialized using configuration in file:/opt/mapr/hive/hive-0.13/conf/hive-log4j.properties
OK
Time taken: 0.486 seconds
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1455125666889_268626, Tracking URL =
Kill Command = /opt/mapr/bin/hadoop job  -kill job_1455125666889_268626 Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0
2016-03-29 01:46:37,753 Stage-1 map = 0%,  reduce = 0%
2016-03-29 01:47:11,979 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 599.7 sec
2016-03-29 01:47:15,076 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 669.22 sec
2016-03-29 01:47:18,169 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 738.77 sec
2016-03-29 01:47:19,200 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 753.23 sec
2016-03-29 01:47:46,028 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 1366.8 sec
2016-03-29 01:47:47,067 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 1388.92 sec
2016-03-29 01:47:51,216 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 1429.25 sec
2016-03-29 01:47:52,245 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1470.08 sec MapReduce Total cumulative CPU time: 24 minutes 30 seconds 80 msec Ended Job = job_1455125666889_268626
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1455125666889_268633, Tracking URL =
Kill Command = /opt/mapr/bin/hadoop job  -kill job_1455125666889_268633 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2016-03-29 01:48:01,025 Stage-3 map = 0%,  reduce = 0%
2016-03-29 01:49:01,552 Stage-3 map = 0%,  reduce = 0%, Cumulative CPU 240.81 sec
2016-03-29 01:49:11,808 Stage-3 map = 16%,  reduce = 0%, Cumulative CPU 300.08 sec
2016-03-29 01:49:17,956 Stage-3 map = 0%,  reduce = 0%
2016-03-29 01:50:18,409 Stage-3 map = 0%,  reduce = 0%, Cumulative CPU 243.14 sec
2016-03-29 01:50:25,577 Stage-3 map = 16%,  reduce = 0%, Cumulative CPU 284.99 sec
2016-03-29 01:50:31,717 Stage-3 map = 0%,  reduce = 0%
2016-03-29 01:51:32,060 Stage-3 map = 0%,  reduce = 0%, Cumulative CPU 255.38 sec
2016-03-29 01:51:41,264 Stage-3 map = 16%,  reduce = 0%, Cumulative CPU 302.65 sec
2016-03-29 01:51:47,396 Stage-3 map = 0%,  reduce = 0%
2016-03-29 01:52:47,713 Stage-3 map = 0%,  reduce = 0%, Cumulative CPU 230.81 sec
2016-03-29 01:53:03,040 Stage-3 map = 100%,  reduce = 0% MapReduce Total cumulative CPU time: 3 minutes 50 seconds 810 msec Ended Job = job_1455125666889_268633 with errors Error during job, obtaining debugging information...
Examining task ID: task_1455125666889_268633_m_000000 (and more) from job job_1455125666889_268633

Task with the most failures(4):
-----
Task ID:
  task_1455125666889_268633_m_000000

-----
Diagnostic Messages for this Task:
Error: GC overhead limit exceeded

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 7   Cumulative CPU: 1478.78 sec   MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Job 1: Map: 1   Cumulative CPU: 230.81 sec   MAPRFS Read: 0 MAPRFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 28 minutes 29 seconds 590 msec

如果文件大小小于300 mb，则上述查询不会出现任何问题。如果文件大小超过300 mb，我们将收到GC限制问题。

当我们询问基础设施团队时，我们被告知要重写我们的查询。有人可以解释上述查询中我们做错了什么吗？

提前致谢

最佳答案

链接http://mail-archives.apache.org/mod_mbox/hive-user/201405.mbox/%3C5300D8DD-F3FA-45AB-A274-19B20FA9AE11%40hortonworks.com%3E包含上述问题的答案

关于hadoop - 将数据从临时表移至登台表时，获取GC开销限制超出错误，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/36325182/