我是Python的新手,我想知道为什么np.var(x)与np.cov(x,y)输出中的cov(x,x)值给出不同的答案。他们不应该一样吗?我知道它与偏见或ddof有关,与规范化有关,但是我不确定这意味着什么,也找不到任何能专门回答我问题的资源。希望有人能帮忙!

最佳答案

在numpy中,cov的默认“ delta自由度”为1,而var的默认ddof为0。从注释到numpy.var

Notes
-----
The variance is the average of the squared deviations from the mean,
i.e.,  ``var = mean(abs(x - x.mean())**2)``.

The mean is normally calculated as ``x.sum() / N``, where ``N = len(x)``.
If, however, `ddof` is specified, the divisor ``N - ddof`` is used
instead.  In standard statistical practice, ``ddof=1`` provides an
unbiased estimator of the variance of a hypothetical infinite population.
``ddof=0`` provides a maximum likelihood estimate of the variance for
normally distributed variables.


因此,您可以通过以下方式让他们同意:

In [69]: cov(x,x)#defaulting to ddof=1
Out[69]:
array([[ 0.5,  0.5],
       [ 0.5,  0.5]])

In [70]: x.var(ddof=1)
Out[70]: 0.5

In [71]: cov(x,x,ddof=0)
Out[71]:
array([[ 0.25,  0.25],
       [ 0.25,  0.25]])

In [72]: x.var()#defaulting to ddof=0
Out[72]: 0.25

08-25 07:39