python中R data.chisq$residuals的等价物是什么?

本文介绍了python中R data.chisq$residuals的等价物是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据:

array([[33, 250, 196, 136, 32],[55, 293, 190, 71, 13]])

我可以从 stats.chi2_contingency(data) 获取 p 值.

是否有任何类似于这个 R 对象 - data.chisq$residuals 来获取 Pearson 残差和标准化残差?

解决方案

如果你不介意依赖，statsmodels 有一个用于列联表的模块计算.例如，

In [2]: import numpy as np在 [3] 中:将 statsmodels.api 导入为 sm在 [4] 中:F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])在 [5] 中:table = sm.stats.Table(F)在 [6]: table.resid_pearson # Pearson 的残差出[6]:数组([[-1.77162519, -1.61362277, -0.05718356, 2.96508777, 1.89079393],[ 1.80687785, 1.64573143, 0.05832142, -3.02408853, -1.92841787]])在 [7]: table.standardized_resids # 标准化残差出[7]:数组([[-2.62309082, -3.0471942, -0.09791681, 4.6295814, 2.74991911],[ 2.62309082, 3.0471942, 0.09791681, -4.6295814, -2.74991911]])

如果您不想依赖statsmodels，这些计算可以在几行中实现，使用scipy.stats.chi2_contingency 的结果.这是一个定义这些残差函数的简短模块.它们采用观察到的频率和预期的频率(由 chi2_contingency 返回).请注意，虽然 chi2_contingency 和以下 residuals 函数适用于 n 维数组，但此处实现的 stdres 仅适用于 2D 数组.>

from __future__ 导入师将 numpy 导入为 np来自 scipy.stats.contingency 进口保证金定义残差(观察到的，预期的):返回(观察到 - 预期)/np.sqrt(预期)def stdres(观察到的，预期的):n = 观察.sum()rsum, csum = 边距(观察值)# 使用整数，计算# csum * rsum * (n - rsum) * (n - csum)# 可能会溢出，因此将 rsum 和 csum 转换为浮点数.rsum = rsum.astype(np.float64)csum = csum.astype(np.float64)v = csum * rsum * (n - rsum) * (n - csum)/n**3返回(观察到的 - 预期)/np.sqrt(v)

通过您的数据，我们得到:

>>>F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])>>>chi2，p，自由度，预期 = chi2_contingency(F)>>>残差(F，预期)数组([[-1.77162519, -1.61362277, -0.05718356, 2.96508777, 1.89079393],[ 1.80687785, 1.64573143, 0.05832142, -3.02408853, -1.92841787]])>>>标准(F，预期)数组([[-2.62309082, -3.0471942, -0.09791681, 4.6295814, 2.74991911],[ 2.62309082, 3.0471942, 0.09791681, -4.6295814, -2.74991911]])

以下是用于比较的 R 计算:

>F <- as.table(rbind(c(33, 250, 196, 136, 32), c(55, 293, 190, 71, 13)))>结果 <- chisq.test(F)>结果$残差A B C D EA -1.77162519 -1.61362277 -0.05718356 2.96508777 1.89079393乙 1.80687785 1.64573143 0.05832142 -3.02408853 -1.92841787>结果$stdresA B C D EA -2.62309082 -3.04719420 -0.09791681 4.62958140 2.74991911乙 2.62309082 3.04719420 0.09791681 -4.62958140 -2.74991911

I have the following data:

array([[33, 250, 196, 136, 32],
       [55, 293, 190,  71, 13]])

I can get the p-value from stats.chi2_contingency(data).

Is there anything similar to this R object - data.chisq$residuals to get the Pearson's residuals and the standardised residuals?

解决方案

If you don't mind the dependency, statsmodels has a module for contingency table calculations. For example,

In [2]: import numpy as np

In [3]: import statsmodels.api as sm

In [4]: F = np.array([[33, 250, 196, 136, 32], [55, 293, 190,  71, 13]])

In [5]: table = sm.stats.Table(F)

In [6]: table.resid_pearson  # Pearson's residuals
Out[6]:
array([[-1.77162519, -1.61362277, -0.05718356,  2.96508777,  1.89079393],
       [ 1.80687785,  1.64573143,  0.05832142, -3.02408853, -1.92841787]])

In [7]: table.standardized_resids  # Standardized residuals
Out[7]:
array([[-2.62309082, -3.0471942 , -0.09791681,  4.6295814 ,  2.74991911],
       [ 2.62309082,  3.0471942 ,  0.09791681, -4.6295814 , -2.74991911]])

If you prefer to not depend on statsmodels, these calculations can be implemented in a few lines, using the results of scipy.stats.chi2_contingency. Here's a short module that defines functions for these residuals. They take the observed frequencies and the expected frequencies (as returned by chi2_contingency). Note that, while chi2_contingency and the following residuals function work for n-dimensional arrays, stdres as implemented here is only for 2D arrays.

from __future__ import division

import numpy as np
from scipy.stats.contingency import margins


def residuals(observed, expected):
    return (observed - expected) / np.sqrt(expected)

def stdres(observed, expected):
    n = observed.sum()
    rsum, csum = margins(observed)
    # With integers, the calculation
    #     csum * rsum * (n - rsum) * (n - csum)
    # might overflow, so convert rsum and csum to floating point.
    rsum = rsum.astype(np.float64)
    csum = csum.astype(np.float64)
    v = csum * rsum * (n - rsum) * (n - csum) / n**3
    return (observed - expected) / np.sqrt(v)

With your data, we get:

>>> F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])

>>> chi2, p, dof, expected = chi2_contingency(F)

>>> residuals(F, expected)
array([[-1.77162519, -1.61362277, -0.05718356,  2.96508777,  1.89079393],
       [ 1.80687785,  1.64573143,  0.05832142, -3.02408853, -1.92841787]])

>>> stdres(F, expected)
array([[-2.62309082, -3.0471942 , -0.09791681,  4.6295814 ,  2.74991911],
       [ 2.62309082,  3.0471942 ,  0.09791681, -4.6295814 , -2.74991911]])

Here's the calculation in R for comparison:

> F <- as.table(rbind(c(33, 250, 196, 136, 32), c(55, 293, 190, 71, 13)))

> result <- chisq.test(F)

> result$residuals
            A           B           C           D           E
A -1.77162519 -1.61362277 -0.05718356  2.96508777  1.89079393
B  1.80687785  1.64573143  0.05832142 -3.02408853 -1.92841787

> result$stdres
            A           B           C           D           E
A -2.62309082 -3.04719420 -0.09791681  4.62958140  2.74991911
B  2.62309082  3.04719420  0.09791681 -4.62958140 -2.74991911

这篇关于python中R data.chisq$residuals的等价物是什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！