本文介绍了pd.corrwith在具有不同列名的pandas数据帧上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以一种有效的方式使x 1与y中的三列之间的皮尔森r.

I would like to get the pearson r between x1 and each of the three columns in y, in an efficient manner.

看来pd.corrwith()仅能为具有完全相同的列标签的列计算此值,例如x和y.

It appears that pd.corrwith() is only able to calculate this for columns that have exactly the same column labels e.g. x and y.

这似乎有点不切实际,因为我认为计算不同变量之间的相关性将是一个普遍的问题.

This seems a bit impractical, as I presume computing correlations between different variables would be a common problem.

In [1]: import pandas as pd; import numpy as np

In [2]: x = pd.DataFrame(np.random.randn(5,3),columns=['A','B','C'])

In [3]: y = pd.DataFrame(np.random.randn(5,3),columns=['A','B','C'])

In [4]: x1 = pd.DataFrame(x.ix[:,0])

In [5]: x.corrwith(y)
Out[5]:
A   -0.752631
B   -0.525705
C    0.516071
dtype: float64

In [6]: x1.corrwith(y)
Out[6]:
A   -0.752631
B         NaN
C         NaN
dtype: float64

推荐答案

您可以使用DataFrame.corrwith(Series)而不是DataFrame.corrwith(DataFrame)来完成所需的操作:

You can accomplish what you want using DataFrame.corrwith(Series) rather than DataFrame.corrwith(DataFrame):

In [203]: x1 = x['A']

In [204]: y.corrwith(x1)
Out[204]:
A    0.347629
B   -0.480474
C   -0.729303
dtype: float64

或者,您可以形成x的每一列与y的每一列之间的相关矩阵,如下所示:

Alternatively, you can form the matrix of correlations between each column of x and each column of y as follows:

In [214]: pd.expanding_corr(x, y, pairwise=True).iloc[-1, :, :]
Out[214]:
          A         B         C
A  0.347629 -0.480474 -0.729303
B -0.334814  0.778019  0.654583
C -0.453273  0.212057  0.149544

A DataFrame.corrwith()没有pairwise=True选项.

这篇关于pd.corrwith在具有不同列名的pandas数据帧上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 10:55