问题描述
嗨!
在处理大问题"时,我的C ++ AMP算法存在一个非常可怕的问题.在某些只有1 GB内存的加速器上.我已经能够通过以下代码找出问题所在:
Hi!
I have a rather horrible issue with my C++ AMP algorithms when processing a "large problem size" on some accelerators with only 1 GB memory. I have been able to isolate the problem in the following code:
void foo_( const array< float >& a1 )
{
accelerator_view av = a1.accelerator_view;
array< float > a2( a1.extent, av );
parallel_for_each( a2.extent,
[&a2, &a1]( index< 1 > i ) restrict( amp ) {
a2[i] = a1[i];
} );
av.wait();
vector< float > v2( a2.extent[0] );
copy( a2, v2.begin() );
av.wait();
int nDiff = 0;
for ( size_t i = 0, N = v2.size(); i < N; ++i ) {
if ( v2[i] != 1.0f ) {
++nDiff;
}
}
wcout << "nDiff=" << nDiff << endl;
}
int _tmain( int argc,
_TCHAR* argv[] )
{
//wstring devicePath = L"direct3d\\warp";
//wstring devicePath = L"PCI\\VEN_1002&DEV_68B8&SUBSYS_29901682&REV_00\\4&362F0840&0&0018"; // Radeon HD 5770, 1 GB
wstring devicePath = L"PCI\\VEN_10DE&DEV_0DD8&SUBSYS_084A10DE&REV_A1\\4&1FF63B03&0&0038"; // Quadro 2000, 1 GB
accelerator_view av = accelerator( devicePath ).create_view();
wcout << "accelerator=" << av.accelerator.description << endl;
array< float > a1( 600 * 600 * 600, av ); // 4*600^3 = 864 MB
parallel_for_each( a1.extent,
[&a1]( index< 1 > i ) restrict( amp ) {
a1[i] = 1.0f;
} );
av.wait();
// Function body of
// void foo_( const array< float >& a1 )
{
accelerator_view av = a1.accelerator_view;
array< float > a2( a1.extent, av );
parallel_for_each( a2.extent,
[&a2, &a1]( index< 1 > i ) restrict( amp ) {
a2[i] = a1[i];
} );
av.wait();
vector< float > v2( a2.extent[0] );
copy( a2, v2.begin() );
av.wait();
int nDiff = 0;
for ( size_t i = 0, N = v2.size(); i < N; ++i ) {
if ( v2[i] != 1.0f ) {
++nDiff;
}
}
wcout << "nDiff=" << nDiff << endl;
}
foo_( a1 );
return 0;
}
(系统:Win 8,VS 2012,解决方案平台x64)
在WARP或Radeon HD 5770 1 GB上运行,效果很好:
$ ./CppAmpConstArrayBug.exe
accelerator = Microsoft Basic渲染驱动程序
nDiff = 0
nDiff = 0
$ ./CppAmpConstArrayBug.exe
加速器= AMD Radeon HD 5700系列
nDiff = 0
nDiff = 0
但是,在Quadro 2000 1 GB(驱动程序版本332.50)上,我得到:
$ ./CppAmpConstArrayBug.exe
accelerator = NVIDIA Quadro 2000
nDiff = 0
nDiff = 81782272
进一步检查发现,这是a2/v2的最后一部分有所不同(清零).
使用GTX 560 Ti 1 GB时,我也看到相同的问题.由于只发生在NVidia卡上,因此可能存在驱动程序错误?
但是,奇怪的是,如果我删除了"const"标记,在"void foo_(const array< float>& a1)"中它也可以在NVidia卡上使用!
感谢您就此事提供的任何见解.例如,导致此问题的原因是什么?在VS2013中是否已修复(如果是C ++ AMP错误)?在驱动程序中是否存在错误?我是否应该注意类似的错误?,等等...
干杯,
T
(System: Win 8, VS 2012, Solution platform x64)
Running on WARP or on a Radeon HD 5770 1 GB it works fine:
$ ./CppAmpConstArrayBug.exe
accelerator=Microsoft Basic Render Driver
nDiff=0
nDiff=0
$ ./CppAmpConstArrayBug.exe
accelerator=AMD Radeon HD 5700 Series
nDiff=0
nDiff=0
However, on a Quadro 2000 1 GB (driver version 332.50) I get:
$ ./CppAmpConstArrayBug.exe
accelerator=NVIDIA Quadro 2000
nDiff=0
nDiff=81782272
Further inspection reveals that it is the last part of a2/v2 that differs (zeroed out).
I also see the same issue when using a GTX 560 Ti 1 GB. Since it only happens for NVidia cards maybe there's a driver bug?
However, the strange thing is if I remove the "const" in "void foo_( const array< float >& a1 )" it works also on the NVidia cards!
I am grateful for any insight you can offer related to this matter. E.g., what causes this problem?, is it fixed in VS2013 (if a C++AMP bug)?, is it a driver bug?, are there any similar bugs I should watch out for?, etc...
Cheers,
T
推荐答案
感谢您报告此问题.
这看起来像是Nvidia驱动程序错误,因为我只能在Nvidia卡上重现该问题.即使在VS2013中,该问题仍然存在.特别是当元素数量大于
This looks like a Nvidia driver bug as I was able to repro the issue on Nvidia cards only. The issue repros even in VS2013. In particular the bug can be reproduced easily when the number of elements are greater than
134217728(12月)
134217728 ( Dec )
1000000000000000000000000000(Bin)
1000000000000000000000000000 ( Bin )
Ox8000000(十六进制)
Ox8000000 ( Hex )
我正在就"const"声明引起的行为差异进行更多调查,并将为您更新.
I am investigating more on the differences in the behavior induced by 'const' declaration and I will updating you.
致谢
Pavan
这篇关于C ++ AMP const数组和残破的lambda结果无一例外的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!