问题描述
我的流式数据流作业(2017-09-08_03_55_43-9675407418829265662
)使用 Apache Beam SDK for Java 2.1.0
不会超过 1 个 Worker,即使 pubsub 不断增长队列(现在有 10 万条未送达的消息)——你知道为什么吗?
My streaming dataflow job(2017-09-08_03_55_43-9675407418829265662
) using Apache Beam SDK for Java 2.1.0
will not scale past 1 Worker even with a growing pubsub queue (now 100k Undelivered messages) – do you have any ideas why?
它当前使用 autoscalingAlgorithm=THROUGHPUT_BASED
和 maxNumWorkers=10
运行.
Its currently running with autoscalingAlgorithm=THROUGHPUT_BASED
and maxNumWorkers=10
.
推荐答案
数据流工程师在这里.我查看了后端中的作业,我可以看到它没有扩展,因为 CPU 利用率低,这意味着其他东西正在限制管道的性能,例如外部节流.在这些情况下,升级很少有帮助.
Dataflow Engineer here. I looked up the job in our backend and I can see that it is not scaling up because CPU utilization is low, meaning something else is limiting the performance of the pipeline, such as external throttling. Upscaling rarely helps in these cases.
我发现有些捆绑包需要长达数小时才能处理.我建议调查您的管道逻辑,看看是否还有其他可以优化的部分.
I see that some bundles are taking up to hours to process. I recommend investigating your pipeline logic and see if there are other parts that can be optimized.
这篇关于数据流流作业未扩展到超过 1 个工作器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!