问题描述
给出两个具有非唯一索引和多维列的DF:
Given two DFs with non unique indexes and multidimentional columns:
ars:
arsenal arsenal arsenal arsenal
NaN B3 SK BX BY
2015-04-15 NaN NaN NaN 26.0
2015-04-14 NaN NaN NaN NaN
2015-04-13 26.0 26.0 23.0 NaN
2015-04-13 22.0 21.0 19.0 NaN
che:
chelsea chelsea chelsea chelsea
NaN B3 SK BX BY
2015-04-15 NaN NaN NaN 1.01
2015-04-14 1.02 NaN NaN NaN
2015-04-14 NaN 1.05 NaN NaN
此处为csv格式
,arsenal,arsenal,arsenal,arsenal
,B3,SK,BX,BY
2015-04-15,,,,26.0
2015-04-14,,,,
2015-04-13,26.0,26.0,23.0,
2015-04-13,22.0,21.0,19.0,
,chelsea,chelsea,chelsea,chelsea
,B3,SK,BX,BY
2015-04-15,,,,1.01
2015-04-14,1.02,,,
2015-04-14,,1.05,,
我想加入/合并它们,这是一种外部联接,以便不删除行.
I would like to join/merge them, sort of an outer join so that rows are not dropped.
我希望输出为:
arsenal arsenal arsenal arsenal chelsea chelsea chelsea chelsea
NaN B3 SK BX BY B3 SK BX BY
2015-04-15 NaN NaN NaN 26.0 NaN NaN NaN 1.01
2015-04-14 NaN NaN NaN NaN 1.02 NaN NaN NaN
2015-04-14 NaN NaN NaN NaN NaN 1.05 NaN NaN
2015-04-13 26.0 26.0 23.0 NaN NaN NaN NaN NaN
2015-04-13 22.0 21.0 19.0 NaN NaN NaN NaN NaN
我所知道的所有熊猫工具都不起作用:merge
,join
,concat
. merge的外部联接给出的点积不是我想要的,而concat
不能处理非唯一索引.
None of the pandas tools I know worked: merge
, join
, concat
. merge's outer join gives a dot product which is not what I am looking for, while concat
can't handle non unique indexes.
您对如何实现此目标有任何想法吗?
Do you have any ideas how this can be achieved?
请注意:数据帧的长度不会是必定的.
Note: the lengths of dataframes won't be idential.
推荐答案
我已经设法使用pandas的concat
方法对其进行了排序.
I've managed to sort it out using pandas' concat
method.
首先,我们需要添加一个Multiindex级别,以使其变得唯一:
First, we need to add a Multiindex level so that it becomes unique:
ars = pd.read_csv("ars.csv", index_col=[0], header=[0,1])
che = pd.read_csv("che.csv", index_col=[0], header=[0,1])
ars.index.name = "date"
ars["num"] = range(0, len(ars.index))
ars = ars.set_index("num", append=True)
che.index.name = "date"
che["num"] = range(0, len(che.index))
che = che.set_index("num", append=True)
现在我们可以使用concat
:
df = pd.concat([ars, che], axis=1)
df = df.reset_index()
df = df.sort_index(by=["date", "num"], ascending=[False, True])
df = df.set_index(["date", "num"])
df.index = df.index.droplevel(1)
输出:
arsenal chelsea
B3 SK BX BY B3 SK BX BY
date
2015-04-15 NaN NaN NaN 26 NaN NaN NaN 1.01
2015-04-14 NaN NaN NaN NaN 1.02 NaN NaN NaN
2015-04-14 NaN NaN NaN NaN NaN 1.05 NaN NaN
2015-04-13 26 26 23 NaN NaN NaN NaN NaN
2015-04-13 22 21 19 NaN NaN NaN NaN NaN
这篇关于合并具有非唯一索引的多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!