问题描述
我有两个熊猫数据帧,都是datetime条目的索引。 df1
具有非唯一的时间索引,而 df2
具有唯一的时间索引。我想以下列方式将一列 df2.a
添加到 df1
中:对于<$ c中的每一行$ c> df1 与时间戳 ts
, df1.a
应该包含最近的值的 df2.a
其时间戳小于 ts
。 例如,假设 df2
每分钟进行采样,并且有时间戳 08:00:15
, 08:00:47
, 08:02:35
在 df1
中。在这种情况下,我希望 df2.a [08:00:00]
中的值用于前两行, df2。一个[08:02:00]
为第三个。我如何做到这一点?
适用于df1的行,df2上的reindex与ffill。
df1 ['df2.a'] = df1.apply(lambda x:pd.Series(df2.a.reindex([x.name])。 ffill()。值),axis = 1)
I have two pandas dataframes, both index with datetime entries. The df1
has non-unique time indices, whereas df2
has unique ones. I would like to add a column df2.a
to df1
in the following way: for every row in df1
with timestamp ts
, df1.a
should contain the most recent value of df2.a
whose timestamp is less then ts
.
For example, let's say that df2
is sampled every minute, and there are rows with timestamps 08:00:15
, 08:00:47
, 08:02:35
in df1
. In this case I would like the value from df2.a[08:00:00]
to be used for the first two rows, and df2.a[08:02:00]
for the third. How can I do this?
apply to rows of df1, reindex on df2 with ffill.
df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)
这篇关于 pandas :添加具有最新值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!