问题描述
我有两个熊猫数据框.第一个看起来像(db1
):
I have two Pandas Data Frames. First one looks like (db1
):
id date_from date_until v1 v2
0 2015-06-17 2015-06-19 2 3
0 2015-06-17 2015-07-01 3 4
1 2015-06-18 2015-06-20 5 6
1 2015-06-19 2015-07-12 5 4
和第二个(db2
),如下所示:
and the second one (db2
), like this:
id date v3 v4
0 2015-06-17 4 3
1 2015-06-17 5 6
1 2015-06-18 6 4
不,我想压缩它们,以便它们基于id
和date
.如果db2.date
大于或等于db1.date_from
,并且使用db2
中最小的那个,则日期压缩应该仍然大于或等于db1.data_from
.因此,压缩后的数据应如下所示:
no i want to compress them so they based on id
and date
. Date compression should be if db2.date
is older or equal than db1.date_from
and use the youngest one from db2
witch is still older or equal than db1.data_from
.So compressed data should look something like this:
id date_from date_until date v1 v2 v3 v4
0 2015-06-17 2015-06-19 2015-06-17 2 3 4 3
0 2015-06-17 2015-07-01 2015-06-17 3 4 4 3
1 2015-06-18 2015-06-20 2015-06-18 5 6 6 4
1 2015-06-19 2015-07-12 2015-06-18 5 4 6 4
除了遍历所有行外,我找不到任何解决方案,但这还不够好,因为我的数据大于100k行.是否有任何工具可以在熊猫中执行此操作,还是我应该使用其他工具?
I can't find any solution except iterating trough all rows, but this is not good enough because my data is bigger 100k of lines. Is there any tool to do this in pandas or should i use something else?
谢谢!
推荐答案
让我们使用 pd.merge_asof
:
Let's use pd.merge_asof
:
pd.merge_asof(df1,df2, left_on=['date_from'], right_on=['date'], by='id')
输出:
id date_from date_until v1 v2 date v3 v4
0 0 2015-06-17 2015-06-19 2 3 2015-06-17 4 3
1 0 2015-06-17 2015-07-01 3 4 2015-06-17 4 3
2 1 2015-06-18 2015-06-20 5 6 2015-06-18 6 4
3 1 2015-06-19 2015-07-12 5 4 2015-06-18 6 4
这篇关于根据日期加入DataFrame,该日期介于另一个DataFrame的日期之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!