本文介绍了 pandas 合并(pd.merge)如何设置索引并加入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个熊猫数据帧:dfLeft和dfRight,日期作为索引.

I have two pandas dataframes: dfLeft and dfRight with the date as the index.

dfLeft:

            cusip    factorL
date  
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
....
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
....

dfRight:

            idc__id    factorR
date  
2012-01-03    XXXX      5.0
2012-01-03    YYYY      6.0
....
2012-01-04    XXXX      5.1
2012-01-04    YYYY      6.2

两者的形状都接近(121900,3)

我尝试了以下合并:

test = pd.merge(dfLeft, dfRight, left_index=True, right_index=True, left_on='cusip', right_on='idc__id', how = 'inner')

这使测试的形状为(60643500, 6).

关于这里出了什么问题的任何建议?我希望它基于日期和cusip/idc_id合并.注意:在此示例中,cusips排成一行,但实际上并非如此.

Any recommendations on what is going wrong here? I want it to merge based on both date and cusip/idc_id. Note: for this example the cusips are lined up, but in reality that may not be so.

谢谢.

预期输出测试:

             cusip    factorL    factorR
date  
2012-01-03    XXXX      4.5          5.0
2012-01-03    YYYY      6.2          6.0
....
2012-01-04    XXXX      4.7          5.1
2012-01-04    YYYY      6.1          6.2

推荐答案

join (这是它在前几行中的工作方式):

You could append 'cuspin' and 'idc_id' as a indices to your DataFrames before you join (here's how it would work on the first couple of rows):

In [10]: dfL
Out[10]: 
           cuspin  factorL
date                      
2012-01-03   XXXX      4.5
2012-01-03   YYYY      6.2

In [11]: dfL1 = dfLeft.set_index('cuspin', append=True)

In [12]: dfR1 = dfRight.set_index('idc_id', append=True)

In [13]: dfL1
Out[13]: 
                   factorL
date       cuspin         
2012-01-03 XXXX        4.5
           YYYY        6.2

In [14]: dfL1.join(dfR1)
Out[14]: 
                   factorL  factorR
date       cuspin                  
2012-01-03 XXXX        4.5        5
           YYYY        6.2        6

这篇关于 pandas 合并(pd.merge)如何设置索引并加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 02:18