请帮忙.1)我需要使用memcpy来移动在gpu上分配的数组.我不能使用std :: memcpy,因为它没有acc例程"(编译器输出).我的代码是
Please, help.1) I need to use memcpy for moving the arrays allocated on the gpu. i can not use std::memcpy because it "has no acc routine" (compiler output). My code is
const int GL=100000;
Particle particles[GL];
int cp01[2][GL];
#pragma acc declare create(particles,cp01)
i read that cudaMemcpy can be used with openacc. In function_device() (not able to fill the array allocated on the gpu) i call from the host
#pragma acc data copy(cp)
#include <cuda_runtime.h>
for using CUDA. And build the project as
cmake ../src -DCMAKE_CXX_COMPILER=pgc++ -DCMAKE_CXX_FLAGS="-acc -Minfo=all -Mcuda=llvm"
该程序可以编译,但是无法正常工作,它在控制台行中挂起,没有任何输出. 如何移动在设备上分配的阵列(使用cudaMemcpy或其他方式)?那是否包括足够使用CUDA的内容?我是否正确构建项目(是否需要使用-Mcuda = llvm)?2)我还有另一个问题:如果有人写
The program compiles, but does not work, it hangs with no output in the console line. How to move arrays allocated on the device (using cudaMemcpy or in some another manner)? Is that one include enough for using CUDA? Do i build the project correctly (using -Mcuda=llvm is necessary or not)?2) i also have another question: if one writes
#pragma acc parallel loop
for(int i=0; i<N; ++i)
the variable N must be allocated on the host only or it may be also on the gpu?
Since "cudaMemcpy" is a host side call where you want to pass in the device pointers, you'll want to use a "host_data" directive. No need to copy "cp" since you'll want to use the host value. Also make sure the host values of "cp01" are current.
#pragma acc host_data use_device(particles)
cudaMemcpy(&particles[cp01[0][0]],&particles[cp01[1] [0]],cp*sizeof(Particle),cudaMemcpyDeviceToDevice);