本文介绍了pandas concat/merge/join多个数据框,仅此一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有(超过)两个数据框:

I have (more than) two dataframes:

In [22]: df = pd.DataFrame({'database' : ['db1', 'db2', 'db3']})

In [23]: df1 = pd.DataFrame({'database' : ['db1', 'db2', 'db3']})

In [24]: df2 = pd.DataFrame({'database' : ['db2', 'db3', 'db4']})

In [25]: df1
Out[25]:
  database
0      db1
1      db2
2      db3

In [26]: df2
Out[26]:
  database
0      db2
1      db3
2      db4

我想要的输出是这种格式的数据帧:

What I want as output is dataframe in this format:

Out[45]:
  database database
0      db1
1      db2      db2
2      db3      db3
3               db4

我设法以这种格式获取它:

I manage to get it in this format like this:

df1.index = df1.database.values.ravel()
df2.index = df2.database.values.ravel()
pd.concat([df1, df2], axis=1).fillna('').reset_index(drop=True)

但是我认为必须有比ravel()函数更好的解决方案.

But I think there must be better solution than this trick with ravel() function.

推荐答案

使用 DataFrame.set_index drop=False:

df = (pd.concat([df1.set_index('database', drop=False),
                 df2.set_index('database', drop=False)], axis=1)
        .fillna('')
        .reset_index(drop=True))
print (df)
  database database
0      db1
1      db2      db2
2      db3      db3
3               db4

使用list comprehension的更多动态解决方案:

More dynamic solution with list comprehension:

dfs = [df, df1, df2]
dfs1 = [x.set_index('database', drop=False) for x in dfs]
df = (pd.concat(dfs1, axis=1)
        .fillna('')
        .reset_index(drop=True))
print (df)
  database database database
0      db1      db1
1      db2      db2      db2
2      db3      db3      db3
3                        db4

这篇关于pandas concat/merge/join多个数据框,仅此一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 19:51