问题描述
parallelism hint"用于storm来并行化一个正在运行的storm拓扑.我知道有诸如工作进程、执行程序和任务之类的概念.使并行性提示尽可能大以便您的拓扑尽可能并行化是否有意义?
"parallelism hint" is used in storm to parallelise a running storm topology. I know there are concepts like worker process, executor and tasks. Would it make sense to make the parallelism hint as big as possible so that your topologies are parallelised as much as possible?
我的问题是如何为我的风暴拓扑找到完美的并行提示数.是取决于我的风暴集群的规模,还是更像是一种拓扑/作业特定设置,它从一种拓扑到另一种拓扑不同?还是取决于两者?
My question is How to find a perfect parallelism hint number for my storm topologies. Is it depending on the scale of my storm cluster or it's more like a topology/job specific setting, it varies from one topology to another? or it depends on both?
推荐答案
添加@Chiron 解释的内容
Adding to what @Chiron explained
在storm中使用parallelism hint"来并行化一个正在运行的storm拓扑
实际上,术语parallelism hint
用于指定组件(spout、bolt)的初始执行器(线程)数量 例如
Actually in storm the term parallelism hint
is used to specify the initial number of executor (threads) of a component (spout, bolt) e.g
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
上面的语句告诉 Storm 最初分配 2 个执行器线程(这可以在运行时更改).再次
The above statement tells storm to allot 2 executor thread initially (this can be changed in the run time). Again
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(4)
setNumTasks(4)
指示运行 4 个关联任务(这在拓扑的整个生命周期中都是相同的).因此,在这种情况下,每个风暴将为每个执行程序运行两个任务.默认情况下,任务数设置为与执行器数相同,即 Storm 将每个线程运行一个任务.
the setNumTasks(4)
indicate to run 4 associated tasks (this will be same throughout the lifetime of a topology). So in this case each storm will be running two tasks per executor. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.
使并行性提示尽可能大是否有意义,以便您的拓扑尽可能并行化
一个关键要注意的是,如果您打算为每个执行程序运行多个任务,它不会提高并行度.因为 executor 使用一个线程来处理所有任务,即任务在一个 executor 上串行运行.
One key thing to note that if you intent to run more than one tasks per executor it does not increase the level of parallelism. Because executor uses one single thread to process all the tasks i.e tasks run serially on an executor.
为每个执行器配置超过 1 个任务的目的是可以在运行时使用重新平衡机制更改执行器(线程)的数量(请记住,在整个生命周期中任务数量始终相同)拓扑),而拓扑仍在运行.
The purpose of configuring more than 1 task per executor is it is possible to change the number of executor(thread) using the re-balancing mechanism in the runtime (remember the number of tasks are always the same through out the life cycle of a topology) while the topology is still running.
增加 worker(负责为一个或多个组件运行一个或多个执行程序)的数量也可能会给您带来性能优势,但这也是相对的,正如我从 这个讨论,nathanmarz 说
Increasing the number of workers (responsible for running one or more executors for one or more components) might also gives you a performance benefit, but this also relative as I found from this discussion where nathanmarz says
拥有更多的工人可能有更好的性能,这取决于您的瓶颈所在.每个工作人员都有一个线程,将元组传递到 0mq 连接以传输给其他工作人员,因此如果您在 CPU 上遇到瓶颈并且每个工作人员都在处理大量元组,则更多工作人员可能会为您带来更好的吞吐量.
所以基本上没有明确的答案,您应该根据您的环境和设计尝试不同的配置.
So basically there is no definite answer to this, you should try different configuration based on your environment and design.
这篇关于如何在storm中调整并行提示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!