问题描述
我有一个矩阵data
,其中包含 m 行和 n 列.我曾经使用 :
I have a matrix data
with m rows and n columns. I used to compute the correlation coefficients between all pairs of rows using np.corrcoef
:
import numpy as np
data = np.array([[0, 1, -1], [0, -1, 1]])
np.corrcoef(data)
现在我也想看看这些系数的p值. np.corrcoef
不提供这些; scipy.stats.pearsonr
可以.但是,scipy.stats.pearsonr
不接受输入矩阵.
Now I would also like to have a look at the p-values of these coefficients. np.corrcoef
doesn't provide these; scipy.stats.pearsonr
does. However, scipy.stats.pearsonr
does not accept a matrix on input.
有没有一种快速的方法来计算所有成对的行的系数和p值(例如,到达两个 m 乘 m 矩阵,其中一个相关系数,另一个具有相应的p值)而不必手动遍历所有对?
Is there a quick way how to compute both the coefficient and the p-value for all pairs of rows (arriving e.g. at two m by m matrices, one with correlation coefficients, the other with corresponding p-values) without having to manually go through all pairs?
推荐答案
我今天遇到了同样的问题.
I have encountered the same problem today.
经过一个半小时的搜索,我在numpy/scipy库中找不到任何代码可以帮助我做到这一点.
After half an hour of googling, I can't find any code in numpy/scipy library can help me do this.
所以我写了自己的 corrcoef
import numpy as np
from scipy.stats import pearsonr, betai
def corrcoef(matrix):
r = np.corrcoef(matrix)
rf = r[np.triu_indices(r.shape[0], 1)]
df = matrix.shape[1] - 2
ts = rf * rf * (df / (1 - rf * rf))
pf = betai(0.5 * df, 0.5, df / (df + ts))
p = np.zeros(shape=r.shape)
p[np.triu_indices(p.shape[0], 1)] = pf
p[np.tril_indices(p.shape[0], -1)] = p.T[np.tril_indices(p.shape[0], -1)]
p[np.diag_indices(p.shape[0])] = np.ones(p.shape[0])
return r, p
def corrcoef_loop(matrix):
rows, cols = matrix.shape[0], matrix.shape[1]
r = np.ones(shape=(rows, rows))
p = np.ones(shape=(rows, rows))
for i in range(rows):
for j in range(i+1, rows):
r_, p_ = pearsonr(matrix[i], matrix[j])
r[i, j] = r[j, i] = r_
p[i, j] = p[j, i] = p_
return r, p
第一个版本使用np.corrcoef的结果,然后基于corrcoef矩阵的上三角值计算p值.
The first version use the result of np.corrcoef, and then calculate p-value based on triangle-upper values of corrcoef matrix.
第二个循环版本只是遍历行,请手动执行pearsonr.
The second loop version just iterating over rows, do pearsonr manually.
def test_corrcoef():
a = np.array([
[1, 2, 3, 4],
[1, 3, 1, 4],
[8, 3, 8, 5],
[2, 3, 2, 1]])
r1, p1 = corrcoef(a)
r2, p2 = corrcoef_loop(a)
assert np.allclose(r1, r2)
assert np.allclose(p1, p2)
测试通过,它们是相同的.
The test passed, they are the same.
def test_timing():
import time
a = np.random.randn(100, 2500)
def timing(func, *args, **kwargs):
t0 = time.time()
loops = 10
for _ in range(loops):
func(*args, **kwargs)
print('{} takes {} seconds loops={}'.format(
func.__name__, time.time() - t0, loops))
timing(corrcoef, a)
timing(corrcoef_loop, a)
if __name__ == '__main__':
test_corrcoef()
test_timing()
在Macbook上针对100x2500矩阵的性能
The performance on my Macbook against 100x2500 matrix
corrcoef_loop需要7.585600137710571秒的循环次数= 10
corrcoef_loop takes 7.585600137710571 seconds loops=10
这篇关于矩阵所有成对行的相关系数和p值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!