问题描述
我正在尝试一个Dataquest练习,但发现这两个软件包的差异是不同的.
I was trying one Dataquest exercise and I figured out that the variance I am getting is different for the two packages..
例如[1,2,3,4]
e.g for [1,2,3,4]
from statistics import variance
import numpy as np
print(np.var([1,2,3,4]))
print(variance([1,2,3,4]))
//1.25
//1.6666666666666667
该练习的预期答案是使用np.var()
The expected answer of the exercise is calculated with np.var()
修改我想这必须要做的是,后一个是样本方差而不是方差.有人可以解释这个差异吗?
EditI guess it has to do that the later one is sample variance and not variance.. Anyone could explain the difference?
推荐答案
使用此
print(np.var([1,2,3,4],ddof=1))
1.66666666667
Delta自由度:计算中使用的除数为N - ddof
,其中N表示元素数.默认情况下,ddof
为零.
Delta Degrees of Freedom: the divisor used in the calculation is N - ddof
, where N represents the number of elements. By default, ddof
is zero.
通常将平均值计算为x.sum() / N
,其中N = len(x)
.但是,如果指定了ddof
,则使用除数N - ddof
.
The mean is normally calculated as x.sum() / N
, where N = len(x)
. If, however, ddof
is specified, the divisor N - ddof
is used instead.
在标准的统计实践中,ddof=1
提供了一个假设的无限总体方差的无偏估计量. ddof=0
为正态分布变量提供方差的最大似然估计.
In standard statistical practice, ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.
诸如numpy之类的统计库使用方差 n 来表示var或方差和标准差
Statistical libraries like numpy use the variance n for what they call var or variance and the standard deviation
有关更多信息,请参阅以下文档: numpy doc
For more information refer this documentation : numpy doc
这篇关于python中的numpy var()和statisticsvariant()有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!