根据日期加入DataFrame

根据日期加入DataFrame

本文介绍了根据日期加入DataFrame,该日期介于另一个DataFrame的日期之间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个熊猫数据框.第一个看起来像(db1):

I have two Pandas Data Frames. First one looks like (db1):

id   date_from   date_until   v1   v2
0    2015-06-17  2015-06-19    2    3
0    2015-06-17  2015-07-01    3    4
1    2015-06-18  2015-06-20    5    6
1    2015-06-19  2015-07-12    5    4

和第二个(db2),如下所示:

and the second one (db2), like this:

id      date      v3   v4
0    2015-06-17   4    3
1    2015-06-17   5    6
1    2015-06-18   6    4

不,我想压缩它们,以便它们基于iddate.如果db2.date大于或等于db1.date_from,并且使用db2中最小的那个,则日期压缩应该仍然大于或等于db1.data_from.因此,压缩后的数据应如下所示:

no i want to compress them so they based on id and date. Date compression should be if db2.date is older or equal than db1.date_from and use the youngest one from db2 witch is still older or equal than db1.data_from.So compressed data should look something like this:

id   date_from   date_until      date      v1   v2   v3   v4
0    2015-06-17  2015-06-19   2015-06-17   2    3    4    3
0    2015-06-17  2015-07-01   2015-06-17   3    4    4    3
1    2015-06-18  2015-06-20   2015-06-18   5    6    6    4
1    2015-06-19  2015-07-12   2015-06-18   5    4    6    4

除了遍历所有行外,我找不到任何解决方案,但这还不够好,因为我的数据大于100k行.是否有任何工具可以在熊猫中执行此操作,还是我应该使用其他工具?

I can't find any solution except iterating trough all rows, but this is not good enough because my data is bigger 100k of lines. Is there any tool to do this in pandas or should i use something else?

谢谢!

推荐答案

让我们使用 pd.merge_asof :

Let's use pd.merge_asof:

pd.merge_asof(df1,df2, left_on=['date_from'], right_on=['date'], by='id')

输出:

   id  date_from  date_until  v1  v2       date  v3  v4
0   0 2015-06-17  2015-06-19   2   3 2015-06-17   4   3
1   0 2015-06-17  2015-07-01   3   4 2015-06-17   4   3
2   1 2015-06-18  2015-06-20   5   6 2015-06-18   6   4
3   1 2015-06-19  2015-07-12   5   4 2015-06-18   6   4

这篇关于根据日期加入DataFrame,该日期介于另一个DataFrame的日期之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 16:41