问题描述
我正在使用Apache Spark本地模式运行pyspark 2.2.0作业,并看到以下警告:
I'm running a pyspark 2.2.0 job using the Apache Spark local mode and see the following warning:
WARN RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0.
发出此警告的原因可能是什么?这是我应该关心的事情还是可以安全地忽略它?
What could be the reason for this warning? Is this something I should care about or can I safely ignore it?
推荐答案
如所示这里,此警告表示您的RAM已满,并且部分RAM内容已移至磁盘.
As indicated here this warning means that your RAM is full and that part of the RAM contents are moved to disk.
另请参见火花常见问题解答
我的数据是否需要容纳在内存中才能使用Spark?
Does my data need to fit in memory to use Spark?
不.如果数据不适合内存,Spark的操作员会将数据溢出到磁盘上,从而使其可以在任何大小的数据上正常运行.同样,由RDD的存储级别决定,如果内存不足,则缓存的数据集要么溢出到磁盘上,要么在需要时即时重新计算.
No. Spark's operators spill data to disk if it does not fit in memory, allowing it to run well on any sized data. Likewise, cached datasets that do not fit in memory are either spilled to disk or recomputed on the fly when needed, as determined by the RDD's storage level.
这篇关于Apache Spark警告“在RowBasedKeyValueBatch上调用spill()"的含义;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!