问题描述
在 Flink 中,根据我的理解,JobManager 可以根据需要将作业分配给多个具有多个插槽的 TaskManager.例如,可以为一个作业分配三个 TaskManager,使用五个槽.
In Flink, as my understanding, JobManager can assign a job to multiple TaskManagers with multiple slots if necessary. For example, one job can be assigned three TaskManagers, using five slots.
现在,说我执行一个带有三个插槽的 TaskManager(TM),分配给 3G RAM 和一个 CPU.
Now, saying that I execute one TaskManager(TM) with three slots, which is assigned to 3G RAM and one CPU.
这和执行三个TaskManager,共享一个CPU,每个都分配1G RAM是完全一样的吗?
Is this totally the same as executing three TaskManagers, sharing one CPU, and each of them is assigned to 1 G RAM?
case 1
---------------
| 3G RAM |
| one CPU |
| three slots |
| TM |
---------------
case 2
--------------------------------------------|
| one CPU |
| ------------ ------------ ------------ |
| | 1G RAM | | 1G RAM | | 1G RAM | |
| | one slot | | one slot | | one slot | |
| | TM | | TM | | TM | |
| ------------ ------------ ------------ |
--------------------------------------------|
推荐答案
性能和操作上的差异是双向的.
There are performance and operational differences that pull in both directions.
在非容器化环境中运行时,使用 RocksDB 状态后端,每台机器有一个 TM 和许多插槽是有意义的.这将最小化每个 TM 的开销.但是,每个 TM 的开销并不大.
When running in non-containerized environments, with the RocksDB state backend, it can make sense to have a single TM per machine, with many slots. This will minimize the per-TM overhead. However, the per-TM overhead is not that significant.
另一方面,每个 TM 使用一个槽运行提供了一些有用的隔离,并减少了垃圾收集的影响,这与基于堆的状态后端特别相关.
On the other hand, running with one slot per TM provides some helpful isolation, and reduces the impact of garbage collection, which is particularly relevant with a heap-based state backend.
对于容器化部署,通常建议每个 TM 使用一个插槽,直到达到一定的规模,此时您将希望通过为每个 TM 添加更多插槽而不是更多 TM 来扩展.问题是检查点协调器需要与每个 TM 协调(而不是每个槽),并且随着 TM 的数量达到数百或数千,这可能会成为瓶颈.
With containerized deployments, it is generally recommended to go with one slot per TM until reaching some significant scale, at which point you will want to scale by adding more slots per TM rather than more TMs. The issue is that the checkpoint coordinator needs to coordinate with each TM (but not with each slot), and as the number of TMs gets into the hundreds or thousands, this can become a bottleneck.
这篇关于一个三槽的TaskManager和Apache Flink中三个一槽的TaskManager是一样的吗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!