本文介绍了触发堆内存配置和钨的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为,通过集成Tungesten项目,spark会自动使用堆外内存.

I thought that with the integration of project Tungesten, spark would automatically use off heap memory.

spark.memory.offheap.size和spark.memory.offheap.enabled的作用是什么?我是否需要在此处手动指定钨的堆外内存量?

What for are spark.memory.offheap.size and spark.memory.offheap.enabled? Do I manually need to specify the amount of off heap memory for Tungsten here?

推荐答案

Spark/Tungsten使用编码器/解码器将JVM对象表示为高度专用的Spark SQL类型对象,然后可以以高性能方式对其进行序列化和操作.内部格式表示非常高效,并且对GC内存的使用友好.

Spark/Tungsten use Encoders/Decoders to represent JVM objects as a highly specialized Spark SQL Types objects which then can be serialized and operated on in a highly performant way. Internal format representation is highly efficient and friendly to GC memory utilization.

因此,即使在默认的堆上模式下运行,钨也减轻了JVM对象内存布局和GC运行时间的大量开销.在这种模式下,钨出于内部目的而在堆上分配对象,分配内存块可能很大,但发生频率却要低得多,并且可以顺利地在GC生成转换中幸存下来.这几乎消除了考虑将内部结构移出堆的需求.

Thus, even operating in the default on-heap mode Tungsten alleviates the great overhead of JVM objects memory layout and the GC operating time. Tungsten in that mode does allocate objects on heap for its internal purposes and the allocation memory chunks might be huge but it happens much less frequently and survives GC generation transitions smoothly. This almost eliminates the need to consider moving this internal structure off-heap.

在打开和关闭此模式的实验中,我们没有看到运行时间的显着改善.但是,启用堆外模式后,您需要仔细设计JVM进程外部的内存分配.当您需要允许和计划除JVM进程配置之外的其他内存块时,这可能在YARN,Mesos等容器管理器中带来一些困难.

In our experiments with this mode on and off we did not see a considerable run time improvements. But what you get with off-heap mode on is that one need to carefully design for the memory allocation outside of you JVM process. This might impose some difficulties within container managers like YARN, Mesos etc when you will need to allow and plan for additional memory chunks besides your JVM process configuration.

Tungsten也在非堆模式下使用sun.misc.Unsafe,这在您的部署方案中可能不是期望的甚至是不可能的(例如,使用限制性Java安全管理器配置).

Also in off-heap mode Tungsten uses sun.misc.Unsafe which might not be a desired or even possible in your deployment scenarios (with restrictive java security manager configuration for example).

当乔希·罗森(Josh Rosen)被问及时,我还将分享一个带有时间标记的视频会议对话类似的问题.

I am also sharing a time tagged video conference talk from Josh Rosen when he is being asked the similar question.

这篇关于触发堆内存配置和钨的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 16:52