问题描述
用于乘以 4x4 矩阵的朴素算法如下所示:
The naive algorithm for multiplying 4x4 matrices looks like this:
void matrix_mul(double out[4][4], double lhs[4][4], double rhs[4][4]) {
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
out[i][j] = 0.0;
for (int k = 0; k < 4; ++k) {
out[i][j] += lhs[i][k] * rhs[k][j];
}
}
}
}
显然,如果out == lhs
或out == rhs
(此处==
表示引用相等),则该算法会给出虚假结果.是否有一个版本允许不简单地复制矩阵的一种或两种情况?如有必要,我很高兴为每种情况提供不同的功能.
Obviously, this algorithm gives bogus results if out == lhs
or out == rhs
(here ==
means reference equality). Is there a version that allows one or both of those cases that doesn't simply copy the matrix? I'm happy to have different functions for each case if necessary.
我找到了 这篇 论文,但它讨论了 Strassen-Winograd 算法对我的小矩阵来说太过分了.this 问题的答案似乎表明如果 out == lhs &&out == rhs
(即我们正在尝试对矩阵进行平方),那么它就无法就地完成,但即使没有令人信服的证据或证明.
I found this paper but it discusses the Strassen-Winograd algorithm which is overkill for my small matrices. The answers to this question seem to indicate that if out == lhs && out == rhs
(i.e., we're attempting to square the matrix), then it can't be done in place, but even there there's no convincing evidence or proof.
推荐答案
我对这个答案并不感到兴奋(我发布它主要是为了让这显然无法完成"的人群沉默),但我'我怀疑使用真正的就地算法(用于将两个 nxn 矩阵相乘的 O(1) 个额外存储字)是否可以做得更好.我们称这两个矩阵为 A 和 B 相乘.假设 A 和 B 没有别名.
I'm not thrilled with this answer (I'm posting it mainly to silence the "it obviously can't be done" crowd), but I'm skeptical that it's possible to do much better with a true in-place algorithm (O(1) extra words of storage for multiplying two n x n matrices). Let's call the two matrices to be multplied A and B. Assume that A and B are not aliased.
如果 A 是上三角,那么乘法问题将如下所示.
If A were upper-triangular, then the multiplication problem would look like this.
[a11 a12 a13 a14] [b11 b12 b13 b14]
[ 0 a22 a23 a24] [b21 b22 b23 b24]
[ 0 0 a33 a34] [b31 b32 b33 b34]
[ 0 0 0 a44] [b41 b42 b43 b44]
我们可以将乘积计算为 B,如下所示.将 B 的第一行乘以 a11
.将 a12
乘以 B 的第二行到第一行.将 a13
乘以 B 的第三行到第一行.将 a14
乘以 B 的第四行到第一行.
We can compute the product into B as follows. Multiply the first row of B by a11
. Add a12
times the second row of B to the first. Add a13
times the third row of B to the first. Add a14
times the fourth row of B to the first.
现在,我们用正确的产品覆盖了 B 的第一行.幸运的是,我们不再需要它了.将 B 的第二行乘以 a22
.将 a23
乘以 B 的第三行到第二行.(你懂的.)
Now, we've overwritten the first row of B with the correct product. Fortunately, we don't need it any more. Multiply the second row of B by a22
. Add a23
times the third row of B to the second. (You get the idea.)
同样,如果 A 是单位下三角,那么乘法问题将如下所示.
Likewise, if A were unit lower-triangular, then the multiplication problem would look like this.
[ 1 0 0 0 ] [b11 b12 b13 b14]
[a21 1 0 0 ] [b21 b22 b23 b24]
[a31 a32 1 0 ] [b31 b32 b33 b34]
[a41 a42 a43 1 ] [b41 b42 b43 b44]
将 a43
次添加到 B 的第三行到第四行.将 a42
乘以 B 的第二行到第四行.将 a41
乘以 B 的第一行到第四行.将 a32
乘以 B 的第二行到第三行.(你懂的.)
Add a43
times to third row of B to the fourth. Add a42
times the second row of B to the fourth. Add a41
times the first row of B to the fourth. Add a32
times the second row of B to the third. (You get the idea.)
完整的算法是原地 LU 分解 A,将 UB 乘以 B,将 LB 乘以 B,然后原地 LU-undecompose A(我不确定是否有人这样做过,但这似乎很容易反转步骤).有大约一百万个理由在实践中不实现这一点,其中两个原因是 A 可能不是 LU 可分解的,并且 A 通常不会用浮点运算完全重构.
The complete algorithm is to LU-decompose A in place, multiply U B into B, multiply L B into B, and then LU-undecompose A in place (I'm not sure if anyone ever does this, but it seems easy enough to reverse the steps). There are about a million reasons not to implement this in practice, two being that A may not be LU-decomposable and that A won't be reconstructed exactly in general with floating-point arithmetic.
这篇关于是否有一种算法可以就地乘以方阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!