问题描述
我有一个样本1和样本2的平均值,std dev和n-样本取自样本人群,但由不同实验室进行了测量.
I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.
样本1和样本2的n不同.我想进行加权(考虑n的)两尾t检验.
n is different for sample 1 and sample 2. I want to do a weighted (take n into account) two-tailed t-test.
我尝试使用 scipy.stat 模块通过使用np.random.normal
创建我的数字,因为它仅获取数据,而不获取诸如mean和std dev之类的stat值(可以直接使用这些值).但这没有用,因为数据数组必须大小相等.
I tried using the scipy.stat module by creating my numbers with np.random.normal
, since it only takes data and not stat values like mean and std dev (is there any way to use these values directly). But it didn't work since the data arrays has to be of equal size.
对于获得p值的任何帮助,我们将不胜感激.
Any help on how to get the p-value would be highly appreciated.
推荐答案
如果原始数据为数组a
和b
,则可以使用 scipy.stats.ttest_ind
,其参数为equal_var=False
:
If you have the original data as arrays a
and b
, you can use scipy.stats.ttest_ind
with the argument equal_var=False
:
t, p = ttest_ind(a, b, equal_var=False)
如果只有两个数据集的摘要统计信息,则可以使用 scipy.stats.ttest_ind_from_stats
(在版本0.16中添加到scipy)或通过公式( http://en.wikipedia.org/wiki/Welch%27s_t_test ).
If you have only the summary statistics of the two data sets, you can calculate the t value using scipy.stats.ttest_ind_from_stats
(added to scipy in version 0.16) or from the formula (http://en.wikipedia.org/wiki/Welch%27s_t_test).
以下脚本显示了可能性.
The following script shows the possibilities.
from __future__ import print_function
import numpy as np
from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr
np.random.seed(1)
# Create sample data.
a = np.random.randn(40)
b = 4*np.random.randn(50)
# Use scipy.stats.ttest_ind.
t, p = ttest_ind(a, b, equal_var=False)
print("ttest_ind: t = %g p = %g" % (t, p))
# Compute the descriptive statistics of a and b.
abar = a.mean()
avar = a.var(ddof=1)
na = a.size
adof = na - 1
bbar = b.mean()
bvar = b.var(ddof=1)
nb = b.size
bdof = nb - 1
# Use scipy.stats.ttest_ind_from_stats.
t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na,
bbar, np.sqrt(bvar), nb,
equal_var=False)
print("ttest_ind_from_stats: t = %g p = %g" % (t2, p2))
# Use the formulas directly.
tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
pf = 2*stdtr(dof, -np.abs(tf))
print("formula: t = %g p = %g" % (tf, pf))
输出:
ttest_ind: t = -1.5827 p = 0.118873
ttest_ind_from_stats: t = -1.5827 p = 0.118873
formula: t = -1.5827 p = 0.118873
这篇关于进行2个样本t检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!