本文介绍了为什么一个作业失败,[否留在设备和QUOT空间;但是DF说,否则?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

执行当我洗牌星火作业失败,并说:留在设备上没有空间,但是当我运行 DF -h 它说我有剩余空间!为什么会这样,我怎么能解决这个问题?

When performing a shuffle my Spark job fails and says "no space left on device", but when I run df -h it says I have free space left! Why does this happen, and how can I fix it?

推荐答案

您还需要监控 DF -i 这显示多少的inode都在使用。

You need to also monitor df -i which shows how many inodes are in use.

在每台机器上,我们创建M * R代表洗牌,临时文件,其中M = map任务数,R = reduce任务的数量。

如果你确实看到磁盘被耗尽索引节点来解决这个问题,您可以:

If you do indeed see that disks are running out of inodes to fix the problem you can:


  • 减小分区(见合并 =洗牌虚假)。

  • 人们可以通过合并文件拖放到O(R)的数量。由于不同的文件系统不同的表现我们建议您在 spark.shuffle.consolidateFiles 阅读,看到https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.

  • 有时你可能只是发现你需要你的DevOps增加FS支持inode数。

  • Decrease partitions (see coalesce with shuffle = false).
  • One can drop the number to O(R) by "consolidating files". As different file-systems behave differently it’s recommended that you read up on spark.shuffle.consolidateFiles and see https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.
  • Sometimes you may simply find that you need your DevOps to increase the number of inodes the FS supports.

修改

合并文件已经从火花,因为1.6版本中删除。

Consolidating files has been removed from spark since version 1.6.https://issues.apache.org/jira/browse/SPARK-9808

这篇关于为什么一个作业失败,[否留在设备和QUOT空间;但是DF说,否则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 22:15