问题描述
我有一个具有以下结构的代码
I have a code with following structure
#pragma omp parallel
{
#omp for nowait
{
// first for loop
}
#omp for nowait
{
// first for loop
}
#pragma barrier
<-- #pragma omp single/critical/atomic --> not sure
dgemm_(....)
#pragma omp for
{
// yet another for loop
}
}
对于dgemm_,我链接了多线程mkl.我希望mkl使用所有可用的8个线程.最好的方法是什么?
For dgemm_, I link with multithreaded mkl. I want mkl to use all available 8 threads. What is the best way to do so?
推荐答案
这是嵌套并行性的一种情况.它由MKL支持,但是仅当您的可执行文件是使用Intel C/C ++编译器构建的时,它才有效.限制的原因是MKL使用Intel的OpenMP运行时,并且不同的OMP运行时不能很好地发挥作用.
This is a case of nested parallelism. It is supported by MKL, but it only works if your executable is built using the Intel C/C++ compiler. The reason for that restriction is that MKL uses Intel's OpenMP runtime and that different OMP runtimes do not play well with each other.
一旦解决,您应该通过将OMP_NESTED
设置为TRUE
来启用嵌套并行性,并通过将MKL_DYNAMIC
设置为FALSE
来禁用MKL对嵌套并行性的检测.如果共享要使用dgemm_
处理的数据,则必须从single
构造内调用后者.如果每个线程处理自己的私有数据,则您不需要任何同步结构,但是使用多线程MKL也不会给您带来任何好处.因此,我认为您的情况是前者.
Once that is sorted out, you should enable nested parallelism by setting OMP_NESTED
to TRUE
and disable MKL's detection of nested parallelism by setting MKL_DYNAMIC
to FALSE
. If the data to be processes with dgemm_
is shared, then you have to invoke the latter from within a single
construct. If each thread processes its own private data, then you don't need any synchronisation constructs, but using multithreaded MKL won't give you any benefit too. Therefore I would assume that your case is the former.
总结一下:
#pragma omp single
dgemm_(...);
并运行:
$ MKL_DYNAMIC=FALSE MKL_NUM_THREADS=8 OMP_NUM_THREADS=8 OMP_NESTED=TRUE ./exe
您还可以通过适当的调用来设置参数:
You could also set the parameters with the appropriate calls:
mkl_set_dynamic(0);
mkl_set_num_threads(8);
omp_set_nested(1);
#pragma omp parallel num_threads(8) ...
{
...
}
尽管我更愿意使用环境变量.
though I would prefer to use environment variables instead.
这篇关于从openmp并行区域调用多线程MKL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!