问题描述
假设我有一个stl::array<float, 24> foo
,它是Column-Major格式arrayfire数组的线性化STL垂饰,例如af::array bar = af::array(4,3,2, 1, f32);
.因此,我有一个尺寸为bar
的af::dim4
对象dims
,我最多有4个af::seq
对象,并且具有线性化数组foo
.
Suppose I have an stl::array<float, 24> foo
which is the linearized STL pendant to a Column-Major format arrayfire array, e.g. af::array bar = af::array(4,3,2, 1, f32);
. So I have an af::dim4
object dims
with the dimensions of bar
, I have up to 4 af::seq
-objects and I have the linearized array foo
.
如何显式获取表示例如以下内容的foo
的索引(即bar
的线性化版本).第二和第三行,即bar(af::seq(1,2), af::span, af::span, af::span)
?我在下面给出了一个小的代码示例,该示例显示了我想要的.最后,我还要解释为什么要这么做.
How is it possible to get explicitly the indices of foo
(i.e. linearized version of bar
) representing e.g. the 2.nd and 3.rd row, i.e. bar(af::seq(1,2), af::span, af::span, af::span)
? I have a small code example given below, which shows what I want. In the end I also explain why I want this.
af::dim4 bigDims = af::dim4(4,3,2);
stl::array<float, 24> foo; // Resides in RAM and is big
float* selBuffer_ptr; // Necessary for AF correct type autodetection
stl::vector<float> selBuffer;
// Load some data into foo
af::array selection; // Resides in VRAM and is small
af::seq selRows = af::seq(1,2);
af::seq selCols = af::seq(bigDims[1]); // Emulates af::span
af::seq selSlices = af::seq(bigDims[2]); // Emulates af::span
af::dim4 selDims = af::dim4(selRows.size, selCols.size, selSlices.size);
dim_t* linIndices;
// Magic functionality getting linear indices of the selection
// selRows x selCols x selSlices
// Assign all indexed elements to a consecutive memory region in selBuffer
// I know their positions within the full dataset, b/c I know the selection ranges.
selBuffer_ptr = static_cast<float> &(selBuffer[0]);
selection = af::array(selDims, selBuffer_ptr); // Copies just the selection to the device (e.g. GPU)
// Do sth. with selection and be happy
// I don't need to write back into the foo array.
Arrayfire必须实现这样的逻辑才能访问元素,我发现了几个相关的类/函数,例如af::index, af::seqToDims, af::gen_indexing, af::array::operator()
-但是我还没有找到简单的出路.
Arrayfire must have such a logic implemented in order to access elements and I found several related classes/functions such as af::index, af::seqToDims, af::gen_indexing, af::array::operator()
- however I couldn't figure an easy way out yet.
我考虑了基本上重新实现operator()
的方法,以便它可以类似地工作,但不需要引用数组对象.但是,如果在arrayfire-framework中有一种简单的方法,则可能会浪费精力.
I thought about basically reimplementing the operator()
, so that it would work similarly but not require a reference to an array-object. But this might be wasted effort if there is an easy way in the arrayfire-framework.
背景:我想要这样做的原因是因为arrayfire在与GPU后端链接时不允许仅在主内存(CPU上下文)中存储数据.由于我需要处理的数据量很大,而且VRAM非常有限,因此我想从始终驻留在主内存中的stl容器实例化af::array
-objects ad-hoc.
Background:The reason I want to do so is because arrayfire does not allow to store data only in main memory (CPU-context) while being linked against a GPU backend. Since I have a big chunk of data that needs to be processed only piece by piece and the VRAM is quite limited, I'd like to instantiate af::array
-objects ad-hoc from an stl-container which always resided in main memory.
当然,我知道我可以编写一些索引魔术来解决问题,但是我想使用相当复杂的af::seq
对象,这可能会使索引逻辑的有效实现变得复杂.
Of course I know that I could program some index magic to work around my problem but I'd like to use quite complicated af::seq
objects which could make an efficient implementation of the index logic complicated.
推荐答案
与Pavan Yalamanchili讨论了Gitter之后,我设法获得了一个我想共享的代码段,以防万一其他人只需要保留他的变量. RAM并将其使用时复制的部分复制到VRAM,即Arrayfire Universe(如果与GPU或Nvidia上的OpenCL链接).
After a discussion with Pavan Yalamanchili on Gitter I managed to get a working piece of code that I want to share in case anybody else needs to hold his variables only in RAM and copy-on-use parts of it to VRAM, i.e. the Arrayfire universe (if linked against OpenCL on GPU or Nvidia).
该解决方案还将帮助在项目中其他任何地方都使用AF的任何人,并希望有一种便捷的方法来访问具有(N< = 4)的大型线性N-dim数组的人.
This solution will also help anybody who is using AF somewhere else in his project anyways and who wants to have a convenient way of accessing a big linearized N-dim array with (N<=4).
// Compile as: g++ -lafopencl malloc2.cpp && ./a.out
#include <stdio.h>
#include <arrayfire.h>
#include <af/util.h>
#include <cstdlib>
#include <iostream>
#define M 3
#define N 12
#define O 2
#define SIZE M*N*O
int main() {
int _foo; // Dummy variable for pausing program
double* a = new double[SIZE]; // Allocate double array on CPU (Big Dataset!)
for(long i = 0; i < SIZE; i++) // Fill with entry numbers for easy debugging
a[i] = 1. * i + 1;
std::cin >> _foo; // Pause
std::cout << "Full array: ";
// Display full array, out of convenience from GPU
// Don't use this if "a" is really big, otherwise you'll still copy all the data to the VRAM.
af::array ar = af::array(M, N, O, a); // Copy a RAM -> VRAM
af_print(ar);
std::cin >> _foo; // Pause
// Select a subset of the full array in terms of af::seq
af::seq seq0 = af::seq(1,2,1); // Row 2-3
af::seq seq1 = af::seq(2,6,2); // Col 3:5:7
af::seq seq2 = af::seq(1,1,1); // Slice 2
// BEGIN -- Getting linear indices
af::array aidx0 = af::array(seq0);
af::array aidx1 = af::array(seq1).T() * M;
af::array aidx2 = af::reorder(af::array(seq2), 1, 2, 0) * M * N;
af::gforSet(true);
af::array aglobal_idx = aidx0 + aidx1 + aidx2;
af::gforSet(false);
aglobal_idx = af::flat(aglobal_idx).as(u64);
// END -- Getting linear indices
// Copy index list VRAM -> RAM (for easier/faster access)
uintl* global_idx = new uintl[aglobal_idx.dims(0)];
aglobal_idx.host(global_idx);
// Copy all indices into a new RAM array
double* a_sub = new double[aglobal_idx.dims(0)];
for(long i = 0; i < aglobal_idx.dims(0); i++)
a_sub[i] = a[global_idx[i]];
// Generate the "subset" array on GPU & diplay nicely formatted
af::array ar_sub = af::array(seq0.size, seq1.size, seq2.size, a_sub);
std::cout << "Subset array: "; // living on seq0 x seq1 x seq2
af_print(ar_sub);
return 0;
}
/*
g++ -lafopencl malloc2.cpp && ./a.out
Full array: ar
[3 12 2 1]
1.0000 4.0000 7.0000 10.0000 13.0000 16.0000 19.0000 22.0000 25.0000 28.0000 31.0000 34.0000
2.0000 5.0000 8.0000 11.0000 14.0000 17.0000 20.0000 23.0000 26.0000 29.0000 32.0000 35.0000
3.0000 6.0000 9.0000 12.0000 15.0000 18.0000 21.0000 24.0000 27.0000 30.0000 33.0000 36.0000
37.0000 40.0000 43.0000 46.0000 49.0000 52.0000 55.0000 58.0000 61.0000 64.0000 67.0000 70.0000
38.0000 41.0000 44.0000 47.0000 50.0000 53.0000 56.0000 59.0000 62.0000 65.0000 68.0000 71.0000
39.0000 42.0000 45.0000 48.0000 51.0000 54.0000 57.0000 60.0000 63.0000 66.0000 69.0000 72.0000
ar_sub
[2 3 1 1]
44.0000 50.0000 56.0000
45.0000 51.0000 57.0000
*/
该解决方案使用了一些未记录的AF函数,并且由于for循环运行在global_idx上而被认为是缓慢的,但是到目前为止,如果希望仅在CPU上下文中仅保存数据并仅与CPU共享部分,则它确实是最好的解决方案. AF的GPU上下文进行处理.
The solution uses some undocumented AF functions and is supposedly slow due to the for loop running over global_idx, but so far its really the best one can do if on wants to hold data in the CPU context exclusively and share only parts with the GPU context of AF for processing.
如果有人知道如何加快此代码的速度,我仍然愿意提出建议.
If anybody knows a way to speed this code up, I'm still open for suggestions.
这篇关于如何显式地从arrayfire获取线性索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!