本文介绍了在Storm中配置并行性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Apache Storm的新手,我正在尝试为自己配置配置Storm并行性.因此,有一篇很棒的文章"了解风暴拓扑",但只会引起问题.

I am new to Apache Storm, and I am trying to figure for myself about configuring storm parallelism. So there is a great article "Understanding the Parallelism of a Storm Topology", but it only arouses questions.

当您具有多节点风暴群集时,每个拓扑将根据TOPOLOGY_WORKERS配置参数作为一个整体进行分布.因此,如果您有5个工人,那么您将有5个喷口副本(每个工人1个),并且用螺栓固定同样的东西.

When you have a multinode storm cluster each topology is distributed as a whole according to TOPOLOGY_WORKERS configuration parameter. So if you have 5 workers, then you have 5 copies of spout (1 per worker), and the same thing is with bolts.

如何在风暴集群内部处理此类情况(最好不创建外部服务):

How to deal with situation like this inside a storm cluster (preferably without creating external services):

  1. 我需要一个用于所有拓扑实例的喷嘴,例如,如果输入数据是通过网络文件夹推送到群集中的,则该文件夹会被扫描以查找新文件.
  2. 混凝土螺栓的类似问题.例如,当数据由许可的第三方库处理时,该库被锁定到一台具体的物理计算机上.

推荐答案

首先,基础知识:

  1. 工人-运行执行程序,每个工人都有自己的JVM
  2. 执行程序-运行任务,每个执行程序都会在风暴中分布在各个工作人员上
  3. 任务-运行喷口/螺栓代码的实例

第二,更正……拥有5个工人并不意味着您会自动获得5个喷口副本.有5个工作程序意味着您有5个单独的JVM,Storm可以在其中分配执行程序来运行(将其视为5个存储桶).

Second, a correction... having 5 workers does NOT mean you will automatically have 5 copies of your spout. Having 5 workers means you have 5 separate JVMs where storm can assign executors to run (think of this as 5 buckets).

在首次创建和提交拓扑时配置了spout的实例数:

The number of instances of your spout is configured when you first create and submit your topology:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("0-spout", new MySpout(), spoutParallelism).setNumTasks(spoutTasks);

由于整个集群只需要一个喷嘴,因此可以将spoutParallelismspoutTasks都设置为1.

Since you want only one spout for the entire cluster, you'd set both spoutParallelism and spoutTasks to 1.

这篇关于在Storm中配置并行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 17:15