Spark“执行器心跳超时"

本文介绍了Spark“执行器心跳超时"的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个简单的可重现Spark错误.(Spark 2.0 + Amazon EMR 5.0 FYI)

I have a simple reproducible Spark error. (Spark 2.0 + Amazon EMR 5.0 FYI)

def row_parse_function():
    # Custom row parsing function. Details omitted.
    return pyspark.sql.types.Row(...)


if __name__ == "__main__"
    spark_context = build_spark_context("max value bug isolation")
    spark_sql_context = SQLContext(spark_context)

    full_source_path = "s3a://my-bucket/ten_gb_data_file.txt.gz"

    # Tried changing partition parameter to no effect.
    raw_rdd = spark_context.textFile(full_source_path, 5000)
    row_rdd = raw_rdd.map(row_parse_function).filter(bool)
    data_frame = spark_sql_context.createDataFrame(row_rdd, AttribPixelMergedStructType)
    # Tried removing and chaning this repartition call to no effect.
    data_frame.repartition(5000)
    # Removing this cache call makes this small sample work.
    data_frame.cache()
    data_frame_count = data_frame.count()

此操作失败，并显示以下信息:

This fails with:

ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 169068 ms
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)

我知道心跳超时错误通常意味着工人死亡，通常是由于内存不足.我该如何解决?

I know the heartbeat timed out error usually means the worker died, typically due to lack of memory. How do I resolve this?

推荐答案

您可以增加执行程序和网络超时.另外，建议您没有足够的内存来进行持久存储(MEMORY_AND_DISK_SER)，这样，如果没有足够的内存来缓存以将其保存在磁盘上.

You can increase the executor and network timeout. Also, it is recommended if you don't have much memory to do persist(MEMORY_AND_DISK_SER) so, if there is not enough memory to cache to save it on disk.

--conf spark.network.timeout 10000000 --conf spark.executor.heartbeatInterval=10000000   --conf spark.driver.maxResultSize=4g

这篇关于Spark“执行器心跳超时"的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！