本文介绍了pandas concat/merge/join多个数据框,仅此一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有(超过)两个数据框:
I have (more than) two dataframes:
In [22]: df = pd.DataFrame({'database' : ['db1', 'db2', 'db3']})
In [23]: df1 = pd.DataFrame({'database' : ['db1', 'db2', 'db3']})
In [24]: df2 = pd.DataFrame({'database' : ['db2', 'db3', 'db4']})
In [25]: df1
Out[25]:
database
0 db1
1 db2
2 db3
In [26]: df2
Out[26]:
database
0 db2
1 db3
2 db4
我想要的输出是这种格式的数据帧:
What I want as output is dataframe in this format:
Out[45]:
database database
0 db1
1 db2 db2
2 db3 db3
3 db4
我设法以这种格式获取它:
I manage to get it in this format like this:
df1.index = df1.database.values.ravel()
df2.index = df2.database.values.ravel()
pd.concat([df1, df2], axis=1).fillna('').reset_index(drop=True)
但是我认为必须有比ravel()函数更好的解决方案.
But I think there must be better solution than this trick with ravel() function.
推荐答案
使用 DataFrame.set_index
与drop=False
:
df = (pd.concat([df1.set_index('database', drop=False),
df2.set_index('database', drop=False)], axis=1)
.fillna('')
.reset_index(drop=True))
print (df)
database database
0 db1
1 db2 db2
2 db3 db3
3 db4
使用list comprehension
的更多动态解决方案:
More dynamic solution with list comprehension
:
dfs = [df, df1, df2]
dfs1 = [x.set_index('database', drop=False) for x in dfs]
df = (pd.concat(dfs1, axis=1)
.fillna('')
.reset_index(drop=True))
print (df)
database database database
0 db1 db1
1 db2 db2 db2
2 db3 db3 db3
3 db4
这篇关于pandas concat/merge/join多个数据框,仅此一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!