python - Pandas 比较多索引数据帧而不循环

我想比较两个多索引数据帧并添加另一列以显示值的差异（如果所有索引值在第一个数据帧和第二个数据帧之间都匹配），而无需使用循环

index_a = [1,2,2,3,3,3]
index_b = [0,0,1,0,1,2]
index_c = [1,2,2,4,4,4]
index = pd.MultiIndex.from_arrays([index_a,index_b], names=('a','b'))
index_1 = pd.MultiIndex.from_arrays([index_c,index_b], names=('a','b'))
df1 = pd.DataFrame(np.random.rand(6,), index=index, columns=['p'])
df2 = pd.DataFrame(np.random.rand(6,), index=index_1, columns=['q'])

df1

df2

结果矩阵（df1-df2）应该看起来像

        p  diff
a b
1 0 .4655  -0.1936

2 0 .8600   .2916
  1 .9010   .3321

3 0 .0652    No Match
  1 .5686    No Match
  2 .8965    No Match

最佳答案

将reindex_like或reindex用于索引的交集：

df1['new'] = (df1['p'] - df2['q'].reindex_like(df1)).fillna('No Match')
#alternative
#df1['new'] = (df1['p'] - df2['q'].reindex(df1.index)).fillna('No Match')
print (df1)
            p       new
a b
1 0  0.955587  0.924466
2 0  0.312497 -0.310224
  1  0.306256  0.231646
3 0  0.575613  No Match
  1  0.674605  No Match
  2  0.462807  No Match

Index.intersection和DataFrame.loc的另一个想法：

df1['new'] = (df1['p'] - df2.loc[df2.index.intersection(df1.index), 'q']).fillna('No Match')

或使用merge左联接：

df = pd.merge(df1, df2, how='left', left_index=True, right_index=True)
df['new'] = (df['p'] - df['q']).fillna('No Match')
print (df)
            p         q       new
a b
1 0  0.789693  0.665148  0.124544
2 0  0.082677  0.814190 -0.731513
  1  0.762339  0.235435  0.526905
3 0  0.727695       NaN  No Match
  1  0.903596       NaN  No Match
  2  0.315999       NaN  No Match

关于python - Pandas 比较多索引数据帧而不循环，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/52535214/