问题描述
这是一个有关Storm的最大喷口如何工作的问题.我目前有一个喷口,它可以读取文件并为文件中的每一行发出一个元组(我知道Storm不是处理文件的最佳解决方案,但是我对此没有选择).
This is a question regarding how Storm's max spout pending works. I currently have a spout that reads a file and emits a tuple for each line in the file (I know Storm is not the best solution for dealing with files but I do not have a choice for this problem).
我将topology.max.spout.pending
设置为50k以限制有多少元组进入要处理的拓扑.但是,我看到这个数字对拓扑没有任何影响.我看到文件中的所有记录每次都被发出.我的猜测是,这可能是由于我在nextTuple()
方法中产生的循环所致,该循环发出了文件中的所有记录.
I set the topology.max.spout.pending
to 50k to throttle how many tuples get into the topology to be processed. However, I see this number not having any effect in the topology. I see all records in a file being emitted every time. My guess is this might be due to a loop I have in the nextTuple()
method that emits all records in a file.
我的问题是:到达topology.max.spout.pending
时,Storm会停止停止为Spout任务调用nextTuple()
吗?这是否意味着每次调用该方法时,我只应发出一个元组?
My question is: Does Storm just stop calling nextTuple()
for the Spout task when topology.max.spout.pending
is reached? Does this mean I should only emit one tuple every time the method is called?
推荐答案
完全正确! Storm只能使用下一条命令来限制喷嘴,因此,如果在收到第一条命令时传输所有内容,Storm将无法限制喷嘴.
Exactly! Storm can only limit your spout with the next command, so if you transmit everything when you receive the first next, there is no way for Storm to throttle your spout.
Storm开发人员建议使用单个next命令发出单个元组.然后,Storm框架将根据需要限制喷口,以满足最大喷口未决"的要求.如果要发射大量的元组,则可以将发射的批次最多分派为最大未决喷口的十分之一,从而使Storm有机会节流.
The Storm developers recommend emitting a single tuple with a single next command. The Storm framework will then throttle your spout as needed to meet the "max spout pending" requirement. If you're emitting a high number of tuples, you can batch your emits to at most a tenth of your max spout pending, to give Storm the chance to throttle.
这篇关于风暴最大喷口未决的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!