C ++ AMP const数组和残破的lambda结果无一例外

本文介绍了C ++ AMP const数组和残破的lambda结果无一例外的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

嗨！

在处理大问题"时，我的C ++ AMP算法存在一个非常可怕的问题.在某些只有1 GB内存的加速器上.我已经能够通过以下代码找出问题所在:

Hi!

I have a rather horrible issue with my C++ AMP algorithms when processing a "large problem size" on some accelerators with only 1 GB memory. I have been able to isolate the problem in the following code:

void foo_( const array< float >& a1 )
{
  accelerator_view av = a1.accelerator_view;

  array< float > a2( a1.extent, av );
  parallel_for_each( a2.extent,
                     [&a2, &a1]( index< 1 > i ) restrict( amp ) {
                       a2[i] = a1[i];
                     } );
  av.wait();
  vector< float > v2( a2.extent[0] );
  copy( a2, v2.begin() );
  av.wait();
  int nDiff = 0;
  for ( size_t i = 0, N = v2.size(); i < N; ++i ) {
    if ( v2[i] != 1.0f ) {
      ++nDiff;
    }
  }
  wcout << "nDiff=" << nDiff << endl;
}

int _tmain( int argc,
            _TCHAR* argv[] )
{
  //wstring devicePath = L"direct3d\\warp";
  //wstring devicePath = L"PCI\\VEN_1002&DEV_68B8&SUBSYS_29901682&REV_00\\4&362F0840&0&0018"; // Radeon HD 5770, 1 GB
  wstring devicePath = L"PCI\\VEN_10DE&DEV_0DD8&SUBSYS_084A10DE&REV_A1\\4&1FF63B03&0&0038"; // Quadro 2000, 1 GB

  accelerator_view av = accelerator( devicePath ).create_view();
  wcout << "accelerator=" << av.accelerator.description << endl;

  array< float > a1( 600 * 600 * 600, av ); // 4*600^3 = 864 MB
  parallel_for_each( a1.extent,
                     [&a1]( index< 1 > i ) restrict( amp ) {
                       a1[i] = 1.0f;
                     } );
  av.wait();

  // Function body of
  // void foo_( const array< float >& a1 )
  {
    accelerator_view av = a1.accelerator_view;

    array< float > a2( a1.extent, av );
    parallel_for_each( a2.extent,
                       [&a2, &a1]( index< 1 > i ) restrict( amp ) {
                         a2[i] = a1[i];
                       } );
    av.wait();
    vector< float > v2( a2.extent[0] );
    copy( a2, v2.begin() );
    av.wait();
    int nDiff = 0;
    for ( size_t i = 0, N = v2.size(); i < N; ++i ) {
      if ( v2[i] != 1.0f ) {
        ++nDiff;
      }
    }
    wcout << "nDiff=" << nDiff << endl;
  }

  foo_( a1 );

  return 0;
}

(系统:Win 8，VS 2012，解决方案平台x64)

在WARP或Radeon HD 5770 1 GB上运行，效果很好:

$ ./CppAmpConstArrayBug.exe
accelerator = Microsoft Basic渲染驱动程序
nDiff = 0
nDiff = 0

$ ./CppAmpConstArrayBug.exe
加速器= AMD Radeon HD 5700系列
nDiff = 0
nDiff = 0

但是，在Quadro 2000 1 GB(驱动程序版本332.50)上，我得到:

$ ./CppAmpConstArrayBug.exe
accelerator = NVIDIA Quadro 2000
nDiff = 0
nDiff = 81782272

进一步检查发现，这是a2/v2的最后一部分有所不同(清零).

使用GTX 560 Ti 1 GB时，我也看到相同的问题.由于只发生在NVidia卡上，因此可能存在驱动程序错误?

但是，奇怪的是，如果我删除了"const"标记，在"void foo_(const array< float>& a1)"中它也可以在NVidia卡上使用！

感谢您就此事提供的任何见解.例如，导致此问题的原因是什么?在VS2013中是否已修复(如果是C ++ AMP错误)?在驱动程序中是否存在错误?我是否应该注意类似的错误?，等等...

干杯，
T

(System: Win 8, VS 2012, Solution platform x64)

Running on WARP or on a Radeon HD 5770 1 GB it works fine:

$ ./CppAmpConstArrayBug.exe
accelerator=Microsoft Basic Render Driver
nDiff=0
nDiff=0

$ ./CppAmpConstArrayBug.exe
accelerator=AMD Radeon HD 5700 Series
nDiff=0
nDiff=0

However, on a Quadro 2000 1 GB (driver version 332.50) I get:

$ ./CppAmpConstArrayBug.exe
accelerator=NVIDIA Quadro 2000
nDiff=0
nDiff=81782272

Further inspection reveals that it is the last part of a2/v2 that differs (zeroed out).

I also see the same issue when using a GTX 560 Ti 1 GB. Since it only happens for NVidia cards maybe there's a driver bug?

However, the strange thing is if I remove the "const" in "void foo_( const array< float >& a1 )" it works also on the NVidia cards!

I am grateful for any insight you can offer related to this matter. E.g., what causes this problem?, is it fixed in VS2013 (if a C++AMP bug)?, is it a driver bug?, are there any similar bugs I should watch out for?, etc...

Cheers,
T

nDiff

C ++ AMP const数组和残破的lambda结果无一例外

问题描述

推荐答案