本文介绍了在 NUMA 节点之间传播 OpenMP 线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个矩阵分布在四个 NUMA 节点本地内存中.现在我想打开 4 个线程,每个线程在一个 CPU 上对应一个不同的 NUMA 节点,以便每个线程可以尽可能快地访问它的矩阵部分.OpenMP 有proc_bind(spread)"选项,但它把线程放在同一个 NUMA 节点上,但放在相距很远的 CPU 上.

I have a matrix spread among four NUMA-node local memories. Now I want to open 4 threads, each one on a CPU corresponding to a different NUMA-node, so that each thread can access its part of the matrix as fast as possible. OpenMP has the "proc_bind(spread)" option, but it puts the threads on the same NUMA-node, but on far apart CPUs.

如何强制线程绑定到不同的 NUMA 节点?

How can I force the threads to bind to different NUMA nodes?

或者,如果这是不可能的:当我在所有节点上使用所有核心(总共 256 个线程)时,我知道如何获取 NUMA 节点的 ID,但我无法控制哪个线程获取哪些索引,例如在 for 循环中.如何根据 NUMA 配置有效地分配工作负载?

Or, if that is not possible: When I use all cores on all nodes (256 threads total), I know how to get the ID of the NUMA node, but I can't control which thread gets which indices e.g. in a for loop. How could I distribute my workload efficiently with respect to the NUMA configuration?

推荐答案

这是我要做的:

  1. 使用numactl -H
  2. 检查哪些内核连接到了哪个NUMA节点
  3. 例如,假设核心 0、1、2 和 3 均位于您要使用的 4 个 NUMA 节点之一上,请设置环境变量 OMP_PLACES 以将线程绑定到这些核心:export OMP_PLACES="{0},{1},{2},{3}"
  4. 最后使用 numactl 的本地内存分配策略启动 OpenMP 二进制文件:numactl -l myBinary
  1. Check which cores are attached to which NUMA node using numactl -H
  2. Assuming for example cores 0, 1, 2 and 3 are each on one of the 4 NUMA nodes you want to use, set the environment variable OMP_PLACES to bind the threads to these cores: export OMP_PLACES="{0},{1},{2},{3}"
  3. Finally launching your OpenMP binary with the local memory allocation policy for numactl: numactl -l myBinary

根据我对您的问题的理解,这应该可行.

For what I understood of your question, that should work.

这篇关于在 NUMA 节点之间传播 OpenMP 线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-06 18:25