将级别附加到python

将级别附加到python

本文介绍了将级别附加到python pandas中的列索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个具有相同列的数据框,我只想在它们的索引上进行合并.

I have several Dataframes with the same columns that I'd like to merge on their indices only.

print df1

out[]:               Value  ISO
       Id
       200001   8432000000  USD
       200230  22588186000  USD
       200247   4633000000  USD
       200291   1188880000  USD
       200418   1779776000  USD

print df2

out[]:               Value  ISO
      Id
      200001  1.309168e+11  USD
      200230  5.444096e+10  USD
      200247  9.499602e+09  USD
      200291  2.089603e+09  USD
      200418  3.827251e+09  USD

print df3

out[]:           Value
      Id
      200001  3.681908
      200230  3.408507
      200247  4.531866
      200291  0.273029
      200418  3.521822

我可以使用

pd.concat([df1, df2, df3], axis=1)

并获得

out[]:              Value  ISO         Value  ISO     Value
      Id
      200001   8432000000  USD  1.309168e+11  USD  3.681908
      200230  22588186000  USD  5.444096e+10  USD  3.408507
      200247   4633000000  USD  9.499602e+09  USD  4.531866
      200291   1188880000  USD  2.089603e+09  USD  0.273029
      200418   1779776000  USD  3.827251e+09  USD  3.521822

但是我丢失了每一列来自何处的信息.我也可以在两个数据帧上进行合并,并使用后缀参数

But I lose the information of where each column came from.I could also do a merge on two dataframes and use the suffixes parameter

print df1.merge(df2, left_index=True, right_index=True, suffixes=('_1', '_2'))

并获得

out[]:            Value_1 ISO_1       Value_2 ISO_2
      Id
      200001   8432000000   USD  1.309168e+11   USD
      200230  22588186000   USD  5.444096e+10   USD
      200247   4633000000   USD  9.499602e+09   USD
      200291   1188880000   USD  2.089603e+09   USD
      200418   1779776000   USD  3.827251e+09   USD

然后我可以用菊花链链接合并,但后缀参数仅适用于共享名称的列.在为第一个合并添加后缀后,名称将不再与第三个数据框相同.

I can then daisy chain my merges but the suffixes parameter only applies to columns that share a name. Once I've suffixed the first merge, the names will no longer be in common with the third dataframe.

我认为解决方案是在每个数据框的列索引上附加一个级别,并附加区分这些列所必需的相关信息.然后,我可以运行pd.concat()并获得如下所示的内容:

I figured the solution would be to append a level to the column index of each dataframe with the relevant information necessary to distinguish those columns. Then I could run a pd.concat() and get something that looks like this:

print pd.concat([df1_, df2_, df3_], axis=1)

out[]:Source           df1                df2            df3
                     Value  ISO         Value  ISO     Value
      200001     8.432e+09  USD  1.309168e+11  USD  3.681908
      200230  2.258819e+10  USD  5.444096e+10  USD  3.408507
      200247     4.633e+09  USD  9.499602e+09  USD  4.531866
      200291   1.18888e+09  USD  2.089603e+09  USD  0.273029
      200418  1.779776e+09  USD  3.827251e+09  USD  3.521822

但是,为了做到这一点.我不得不像这样滥用数据框:

However, in order to get this to happen. I had to abuse the dataframes like so:

df1_ = df1.T
df1_['Source'] = 'df1'
df1_.set_index('Source', append=True, inplace=True)
df1_.index = df1_.index.swaplevel(0, 1)
df1_ = df1_.T

最终,我希望结果看起来很像最后一个concat语句.有没有更好的方法去那儿?有没有更好的方法将级别附加到列索引?

Ultimately, I want a result to look a lot like the last concat statement. Is there a better way to get there? Is there a better way to append a level to the column index?

谢谢,PiR

推荐答案

我想要一个MultiIndex,可以直接在concat函数中执行此操作以获得相同的结果,例如:

I you want a MultiIndex, you can do this directly in the concat function to get the same results, like:

pd.concat([df1, df2, df3], axis=1, keys=['df1', 'df2', 'df3'])

pd.concat({'df1':df1, 'df2':df2, 'df3':df3}, axis=1)

另请参见来自数据帧序列的多索引数据帧

这篇关于将级别附加到python pandas中的列索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-26 07:26