pandas ：添加具有最新值的列

本文介绍了 pandas ：添加具有最新值的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个熊猫数据帧，都是datetime条目的索引。 df1 具有非唯一的时间索引，而 df2 具有唯一的时间索引。我想以下列方式将一列 df2.a 添加到 df1 中：对于<$ c中的每一行$ c> df1 与时间戳 ts ， df1.a 应该包含最近的值的 df2.a 其时间戳小于 ts 。

例如，假设 df2 每分钟进行采样，并且有时间戳 08:00:15 ， 08:00:47 ， 08:02:35 在 df1 中。在这种情况下，我希望 df2.a [08:00:00] 中的值用于前两行， df2。一个[08:02:00] 为第三个。我如何做到这一点？

解决方案

适用于df1的行，df2上的reindex与ffill。

  df1 ['df2.a'] = df1.apply（lambda x：pd.Series（df2.a.reindex（[x.name]）。 ffill（）。值），axis = 1）

I have two pandas dataframes, both index with datetime entries. The df1 has non-unique time indices, whereas df2 has unique ones. I would like to add a column df2.a to df1 in the following way: for every row in df1 with timestamp ts, df1.a should contain the most recent value of df2.a whose timestamp is less then ts.

For example, let's say that df2 is sampled every minute, and there are rows with timestamps 08:00:15, 08:00:47, 08:02:35 in df1. In this case I would like the value from df2.a[08:00:00] to be used for the first two rows, and df2.a[08:02:00] for the third. How can I do this?

解决方案

apply to rows of df1, reindex on df2 with ffill.

df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)

这篇关于 pandas ：添加具有最新值的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！