问题描述
刚刚阅读了有关storm的更多详细信息,并发现它具有进行字段分组的能力,例如,如果您在计算每个用户的推文并且您有两个具有用户ID字段分组的任务,则相同的用户ID将被发送到相同的任务.
Just reading more details on storm and came across it's ability to do fields grouping so for example if you where counting tweets per user and you had two tasks with a fields grouping of user-id the same user-id's would get sent to the same tasks.
因此任务 1 可能在内存中具有以下计数鲍勃:10爱丽丝:5
So task 1 could have the following counts in memorybob: 10alice: 5
任务 2 在内存中可能有以下计数吉尔:10乔:4
task 2 could have the following counts in memoryjill:10joe: 4
如果我向集群添加了一台新机器以增加容量并运行重新平衡,我在内存中的计数会发生什么变化?你会开始获得不同数量的用户吗?
If I added a new machine to the cluster to increase capacity and ran rebalance, what happens to my counts in memory? Will you start to get users with different counts?
推荐答案
使用字段分组,我们可以引导特定字段转到特定任务.
Using fields grouping we can guide a specific field to go to a particular tasks.
字段分组:流按分组中指定的字段进行分区.例如,如果流按user-id"字段分组,则具有相同user-id"的元组将始终执行相同的任务,但具有不同user-id"的元组可能会执行不同的任务.
这些任务在风暴的生命周期中始终是静态的,您可以使用 rebalance
更改的是执行器(线程)的数量.在向集群添加新节点的情况下,您可以重新配置要运行的执行程序数量,而无需关闭拓扑,但无论任务数量保持不变.只是添加一个新节点可以通过调整storm的并行度来提高性能.
these task are always static in a storm's life cycle, what you can alter using the rebalance
is number of executors(threads). in case of adding a new node to a cluster allows you to reconfigure the number of executors to run with out shutting down the topology but no matter what the number of tasks remains the same. its just that adding a new node gives you the advantage of increasing the performance by tuning the parallelism of storm.
这篇关于当你添加更多节点时,Storm 如何处理字段分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!