我正在尝试使用JavaScript statistics.js库在bigquery中运行成对的t检验,但数值不一致。

我的代码如下:

 CREATE TEMPORARY FUNCTION ttest(a ARRAY<FLOAT64>, b ARRAY<FLOAT64>)
 RETURNS FLOAT64
 LANGUAGE js AS
"""
   return statistical.methods.tTestTwoSample(a, b);
   //statistical.methods.tTestTwoSample(sample1: Array, sample2: Array)
"""
OPTIONS (
library="gs://my_bucket/statistical.js"
);


SELECT
 ttest([9.96,   3.76,   1.17,   8.66,   5.25,   7.61,   5.8,    1.84,   7.06, 9.4,  2.99,   9.3,    9.01,   4.24,   3.52,   9.6,    7.59,   6.99,   9.62,   2.18] ,
    [3.96,  5.76,   7.17,   7.66,   9.25,   3.61,   4.8,    8.84,   6.06,   4.4,    1.99,   8.3,    9.01,   1.24,   5.52,   8.6,    5.59,   1.99,   7.62,   3.18]  ) ttest

使用gs://my_bucket/statistical.js作为JS软件包在Google存储中的保存位置。

该软件包可在此处https://github.com/adrien2p/statistical-js
两个样本t检验statistical.methods.tTestTwoSample(sample1: Array, sample2: Array)应该返回p值。我无法确定测试是否配对。当我运行代码时,得到4.099,据我所知这不是p值。我已经在Excel中运行了测试以使用=T.TEST()验证结果,并且在配对的双面测试中获得了0.4871的p值。我还尝试了除配对以外的其他t检验,并尝试将4.099结果视为T统计,而没有任何匹配的运气。

因此,我的问题是,如何在BigQuery中运行带有结果p值的配对t检验?我应该使用其他软件包吗?

在此先感谢您的帮助。

最佳答案

无需太多工作,我决定手动计算t统计量,并使用jStat库(可从github.com/jstat/jstat获得)将统计量转换为p值。

我的代码如下:

CREATE TEMPORARY FUNCTION tscore_to_p(a FLOAT64, b FLOAT64, c FLOAT64)
 RETURNS FLOAT64
 LANGUAGE js AS
"""
  return jStat.ttest(a,b,c); //jStat.ttest( tscore, n, sides)
"""
OPTIONS (
 library="gs://my_bucket/jstat.min.js"
);


WITH data AS (
 SELECT * FROM
 (SELECT 9.96 AS A, 3.96 AS B) UNION ALL
 (SELECT 3.76 AS A, 5.76 AS B) UNION ALL
 (SELECT 1.17 AS A, 7.17 AS B) UNION ALL
 (SELECT 8.66 AS A, 7.66 AS B) UNION ALL
 (SELECT 5.25 AS A, 9.25 AS B) UNION ALL
 (SELECT 7.61 AS A, 3.61 AS B) UNION ALL
 (SELECT 5.80 AS A, 4.80 AS B) UNION ALL
 (SELECT 1.84 AS A, 8.84 AS B) UNION ALL
 (SELECT 7.06 AS A, 6.06 AS B) UNION ALL
 (SELECT 9.40 AS A, 4.40 AS B) UNION ALL
 (SELECT 2.99 AS A, 1.99 AS B) UNION ALL
 (SELECT 9.30 AS A, 8.30 AS B) UNION ALL
 (SELECT 9.01 AS A, 9.01 AS B) UNION ALL
 (SELECT 4.24 AS A, 1.24 AS B) UNION ALL
 (SELECT 3.52 AS A, 5.52 AS B) UNION ALL
 (SELECT 9.60 AS A, 8.60 AS B) UNION ALL
 (SELECT 7.59 AS A, 5.59 AS B) UNION ALL
 (SELECT 6.99 AS A, 1.99 AS B) UNION ALL
 (SELECT 9.62 AS A, 7.62 AS B) UNION ALL
 (SELECT 2.18 AS A, 3.18 AS B)
 )


SELECT
   COUNT(*) n
 , COUNT(*)-1 dof
 , AVG(difference) mean
 , STDDEV_SAMP(difference) SD
 , STDDEV_SAMP(difference)/SQRT(COUNT(*)) SE
 , AVG(difference)/ (STDDEV_SAMP(difference)/SQRT(COUNT(*))) t
 , tscore_to_p((AVG(difference)/ (STDDEV_SAMP(difference)/SQRT(COUNT(*)))) COUNT(*), 2) p_value

FROM
  (SELECT *, (A-B) difference FROM data)

这样得出的p值为0.4871,与在Excel中运行的配对t检验的p值相同。

感谢所有考虑过此问题的人。

10-02 11:00