问题描述
我假设numpy.cov(X)
计算样本协方差矩阵为:
I assume numpy.cov(X)
computes the sample covariance matrix as:
1/(N-1) * Sum (x_i - m)(x_i - m)^T (where m is the mean)
即外部产品的总和.但是文档中没有任何地方真正说出这一点,只是说估计协方差矩阵".
I.e sum of outer products. But nowhere in the documentation does it actually say this, it just says "Estimate a covariance matrix".
任何人都可以确认这是否是内部的吗? (我知道我可以使用bias
参数从前面更改常量.)
Can anyone confirm whether this is what it does internally? (I know I can change the constant out the front with the bias
parameter.)
推荐答案
您可以看到源,在最简单的情况下,没有掩码,并且每个N
变量均带有M
个样本,它会返回(N, N)
协方差矩阵,其计算方式如下:
As you can see looking at the source, in the simplest case with no masks, and N
variables with M
samples each, it returns the (N, N)
covariance matrix calculated as:
(x-m) * (x-m).T.conj() / (N - 1)
其中*
表示矩阵乘积
大致实现为:
X -= X.mean(axis=0)
N = X.shape[1]
fact = float(N - 1)
return dot(X, X.T.conj()) / fact
如果您要查看源代码,请看看这里,而不是E先生的链接,除非您对屏蔽数组感兴趣.如您所述,文档并不好.
If you want to review the source, look here instead of the link from Mr E unless you're interested in masked arrays. As you mentioned, the documentation isn't great.
在这种情况下有效地(但不完全是)外部乘积,因为(x-m)
具有长度为M
的N
列向量,因此(x-m).T
一样多行向量.最终结果是所有外部乘积的总和.如果顺序相反,则相同的*
将给出内部(标量)乘积.但是,从技术上讲,它们都是标准矩阵乘法,真正的外部乘积只是列向量与行向量的乘积.
which in this case is effectively (but not exactly) the outer product because (x-m)
has N
column vectors of length M
and thus (x-m).T
is as many row vectors. The end result is the sum of all the outer products. The same *
will give the inner (scalar) product if the order is reversed. But, technically these are both just standard matrix multiplications and the true outer product is only the product of a column vector onto a row vector.
这篇关于numpy cov(协方差)函数,它究竟计算什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!