本文介绍了Spark流:长时间排队/活动批次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能指出这个活跃批次在这里停留了数周却从未被处理过的原因是什么?非常感谢.

could anyone please point out what's the cause of this active batches hanging there for many weeks and never being processed? Thanks a lot.

我的猜测是执行者不足,更多的工人/执行者将解决问题?还是Spark在其任务计划程序中为不同批次分配优先级?

My guess is not enough executors, and more workers/executors will solve the problem? Or Spark assign priority on different batches within its task scheduler?

但是这里的情况是,最近的批次(6月底)已成功处理,但5月份的批次仍在排队.

But the situation here is, very recent batches (end of June) got processed successfully, but batches in May still being queued.

我刚刚检查了我的Spark设置,调度程序策略是FIFO

I just checked my Spark setting, scheduler policy is FIFO

spark.scheduler.mode    FIFO

推荐答案

事实证明主节点是瓶颈.

主节点内存不足,因此调度程序可能无法足够快地处理.

Master node is short of memory, and then maybe scheduler cannot process fast enough.

解决方案:将主节点更改为功能更强大的EC2实例

Solution: change master node to be a more powerful EC2 instance

这篇关于Spark流:长时间排队/活动批次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-14 08:33