问题描述
你如何高效地转置矩阵?有没有你会用什么算法库对于这一点,还是?
How do you efficiently transpose a matrix? Are there libraries for this, or what algorithm would you use?
例如:
short src[W*H] = {
{1,2,3},
{4,5,6}
};
short dest[W*H];
rotate_90_clockwise(dest,src,W,H); //<-- magic in here, no need for in-place
//dest is now:
{
{4, 1},
{5, 2},
{6, 3}
};
(在我的特定情况下,它的src数组是原始图像数据,并且目标是一个帧缓冲,而我的嵌入式ARM的工具链不支持组件)
(In my specific case its src array is raw image data, and the destination is a framebuffer, and I'm embedded on ARM on a toolchain that doesn't support assembly)
推荐答案
有这个,图书馆在某些情况下。和,特别是,有技巧可以与量化数据玩(例如,在一个128位的矢量四个32位元素,但是这也适用于4个8位字节中的32位寄存器)去比个人快k-元的访问。
There are libraries for this, in some cases. And, notably, there are tricks you can play with vectorized data (e.g., four 32-bit elements in a 128-bit vector, but this also applies to four 8-bit bytes in a 32-bit register) to go faster than individual-element accesses.
对于转置,标准的想法是,你使用洗牌的指令,它允许您创建一个新的数据矢量从现有的两个向量,以任意顺序。你的工作与输入数组的4×4块。于是,开始了,您有:
For a transpose, the standard idea is that you use "shuffle" instructions, which allow you to create a new data vector out of two existing vectors, in any order. You work with 4x4 blocks of the input array. So, starting out, you have:
v0 = 1 2 3 4
v1 = 5 6 7 8
v2 = 9 A B C
v3 = D E F 0
然后,你申请洗牌指令前两个向量(交错的奇数元素,A0B0 C0D0 - > ABCD,并交错的偶数元素,0A0B 0C0D - > ABCD),以及最后两个,创建一个新的集
Then, you apply shuffle instructions to the first two vectors (interleaving their odd elements, A0B0 C0D0 -> ABCD, and interleaving their even elements, 0A0B 0C0D -> ABCD), and to the last two, to create a new set of vectors with each 2x2 block transposed:
1 5 3 7
2 6 4 8
9 D B F
A E C 0
最后,应用洗牌说明奇数对和偶数对(结合自己的第一双的元素,AB00 CD00的 - > ABCD,和他们最后的对,00AB 00CD - > ABCD),可以得到:
Finally, you apply shuffle instructions to the odd pair and the even pair (combining their first pairs of elements, AB00 CD00 -> ABCD, and their last pairs, 00AB 00CD -> ABCD), to get:
1 5 9 D
2 6 A E
3 7 B F
4 8 C 0
在那里,16元转八说明!
And there, 16 elements transposed in eight instructions!
现在,在32位寄存器的8位字节,ARM并没有完全洗牌指令,但可以合成您需要哪些转变和SEL(选择)指令,而第二组洗牌,你可以做一个指令与PKHBT(包半字底部顶部)和PKHTB(包半字顶底)的说明。
Now, for 8-bit bytes in 32-bit registers, ARM doesn't have exactly shuffle instructions, but you can synthesize what you need with shifts and a SEL (select) instruction, and the second set of shuffles you can do in one instruction with the PKHBT (pack halfword bottom top) and PKHTB (pack halfword top bottom) instructions.
最后,如果您使用的是带有NEON vectorizations大的ARM处理器,你可以做这样的事情,16元载体上的16×16块。
Finally, if you're using a large ARM processor with NEON vectorizations, you can do something like this with 16-element vectors on 16x16 blocks.
这篇关于移调一个二维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!