显示所有作业完成后，Spark作业重新启动，然后失败(TimeoutException:[300秒]之后，期货超时)

本文介绍了显示所有作业完成后，Spark作业重新启动，然后失败(TimeoutException:[300秒]之后，期货超时)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从事火花工作.它显示所有工作均已完成:

I'm running a spark job. It shows that all of the jobs were completed:

不过，几分钟后，整个作业将重新启动，这一次它将显示所有作业和任务也已完成，但是几分钟后，它将失败.我在日志中发现了此异常:

however after couple of minutes the entire job restarts, this time it will show all jobs and tasks were completed too, but after couple of minutes it will fail.I found this exception in the logs:

java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]

因此，当我尝试连接2个非常大的表时就会发生这种情况:3B行之一，第二行为200M行，当我在结果数据帧上运行show(100)时，所有内容都经过评估，而我得到了这个问题.

So this happens when I'm trying to join 2 pretty big tables: one of 3B rows, and the second is 200M rows, when I run show(100) on the resulting dataframe, everything gets evaluated and I'm getting this issue.

我尝试增加/减少分区数，然后通过增加线程数将垃圾回收器更改为G1.我将spark.sql.broadcastTimeout更改为600(这使超时消息更改为600秒).

I tried playing around with increasing/decreasing the number of partitions, I changed the garbage collector to G1 with increased number of threads. I changed spark.sql.broadcastTimeout to 600 (which made the time out message to change to 600 seconds).

我还读到这可能是一个通信问题，但是在此代码段之前运行的其他show()子句可以正常工作，所以可能不是.

I also read that this might be a communication issue, however other show() clauses that run prior this code segment work without problems, so it's probably not it.

这是Submit命令:

This is the submit command:

/opt/spark/spark-1.4.1-bin-hadoop2.3/bin/spark-submit  --master yarn-cluster --class className --executor-memory 12g --executor-cores 2 --driver-memory 32g --driver-cores 8 --num-executors 40 --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:ConcGCThreads=20" /home/asdf/fileName-assembly-1.0.jar

您可以了解有关Spark版本以及从那里使用的资源的想法.

you can get the idea about spark versions, and the resources used from there.

我从这里去哪里?我们将不胜感激，如有需要，还将提供代码段/其他日志记录.

Where do I go from here? Any help will be appreciated, and code segments/additional logging will be provided if needed.

300秒

显示所有作业完成后，Spark作业重新启动，然后失败(TimeoutException:[300秒]之后，期货超时)

问题描述

推荐答案