问题描述
我有一个矩阵分布在四个 NUMA 节点本地内存中.现在我想打开 4 个线程,每个线程在一个 CPU 上对应一个不同的 NUMA 节点,以便每个线程可以尽可能快地访问它的矩阵部分.OpenMP 有proc_bind(spread)"选项,但它把线程放在同一个 NUMA 节点上,但放在相距很远的 CPU 上.
I have a matrix spread among four NUMA-node local memories. Now I want to open 4 threads, each one on a CPU corresponding to a different NUMA-node, so that each thread can access its part of the matrix as fast as possible. OpenMP has the "proc_bind(spread)" option, but it puts the threads on the same NUMA-node, but on far apart CPUs.
如何强制线程绑定到不同的 NUMA 节点?
How can I force the threads to bind to different NUMA nodes?
或者,如果这是不可能的:当我在所有节点上使用所有核心(总共 256 个线程)时,我知道如何获取 NUMA 节点的 ID,但我无法控制哪个线程获取哪些索引,例如在 for 循环中.如何根据 NUMA 配置有效地分配工作负载?
Or, if that is not possible: When I use all cores on all nodes (256 threads total), I know how to get the ID of the NUMA node, but I can't control which thread gets which indices e.g. in a for loop. How could I distribute my workload efficiently with respect to the NUMA configuration?
推荐答案
这是我要做的:
- 使用
numactl -H
检查哪些内核连接到了哪个NUMA节点 - 例如,假设核心 0、1、2 和 3 均位于您要使用的 4 个 NUMA 节点之一上,请设置环境变量
OMP_PLACES
以将线程绑定到这些核心:export OMP_PLACES="{0},{1},{2},{3}"
- 最后使用 numactl 的本地内存分配策略启动 OpenMP 二进制文件:
numactl -l myBinary
- Check which cores are attached to which NUMA node using
numactl -H
- Assuming for example cores 0, 1, 2 and 3 are each on one of the 4 NUMA nodes you want to use, set the environment variable
OMP_PLACES
to bind the threads to these cores:export OMP_PLACES="{0},{1},{2},{3}"
- Finally launching your OpenMP binary with the local memory allocation policy for numactl:
numactl -l myBinary
根据我对您的问题的理解,这应该可行.
For what I understood of your question, that should work.
这篇关于在 NUMA 节点之间传播 OpenMP 线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!