问题描述
我有4个数据框,其数据类型与datetime索引类似,但是在每个数据帧中都缺少几条缺失的行,我知道可以使用以前已知的数据来填充空白。
I have 4 dataframes with data of similar datetime indexes, however in each of them there are few missing lines and I know that the gaps can be filled using previous known data.
我想对齐这些数据帧,使它们具有所有数据帧的索引的并集,并填充缺失的值。
我知道如何做2个数据框: df1,df2 = df1.align(df2,axis = 0,method ='pad')
,但是什么是否可以做超过2次的好方法?
I would like to 'align' these dataframes so that they have union of indexes of all dataframes and to fill in missing values.I know how to do it for 2 dataframes: df1, df2 = df1.align(df2, axis=0, method='pad')
, but what is the good way to do it for more than 2?
我已经尝试过,它的工作正常:
I have tried this and it does work:
df1 = pd.DataFrame({'values': 1}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-03']))
df2 = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-06-04', '2016-06-07']))
df3 = pd.DataFrame({'values': 3}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-05']))
commonIndex = df1.index.join(df2.index, how='outer').join(df3.index, how='outer')
for d in [df1, df2, df3]:
d = d.reindex(index=commonIndex, method='pad')
当我对我的实际数据尝试相同的方法时,会给出错误:ValueError:index必须是单调递增或递减。它是价格数据,索引如下所示:
However when I try same approach on my real data, it gives error: "ValueError: index must be monotonic increasing or decreasing". It is price data, and index looks like this:
DatetimeIndex(['2014-03-24 00:00:00', '2014-03-24 00:01:00',
'2014-03-24 00:02:00', '2014-03-24 00:03:00',
'2014-03-24 00:04:00', '2014-03-24 00:05:00',
'2014-03-24 00:06:00', '2014-03-24 00:07:00',
'2014-03-24 00:08:00', '2014-03-24 00:09:00',
...
'2014-10-10 17:51:00', '2014-10-10 17:52:00',
'2014-10-10 17:53:00', '2014-10-10 17:54:00',
'2014-10-10 17:55:00', '2014-10-10 17:56:00',
'2014-10-10 17:57:00', '2014-10-10 17:58:00',
'2014-10-10 17:59:00', '2014-10-10 18:00:00'],
dtype='datetime64[ns]', name=u'datetime', length=139671, freq=None)
As据我所知,这应该会越来越多。不确定单调,但我认为这不是上述例子所示的限制(仍然缺少日期)。
As far as I understand it should be increasing. Not sure about 'monotonic', but I assumed that it wasn't a constraint as demonstrated in above example (there were still missing dates).
任何帮助都非常感谢,如果我没有使用正确的术语,请原谅我。
Any help is greatly appreciated, and forgive me if I'm not using the correct terminology.
PS当我遍历数据框列表时,应用重新索引后,它们实际上并没有保存在df1,df2,df3中。如何解决?
P.S. When I iterate through list of dataframes it looks like they are not actually saved in df1, df2, df3 after applying reindexing. How do I fix that?
推荐答案
这是您尝试实现的行为吗?请注意,无论数据帧上的索引是否单调,此方法都可以工作。
Is this the behavior you are trying to achieve? Note that this method works regardless of whether or not the indexes on the dataframes are monotonic.
df1 = pd.DataFrame({'values': 1}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-03']))
df2 = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-06-04', '2016-06-07']))
df3 = pd.DataFrame({'values': 3}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-05']))
df = pd.concat([df1,df2,df3], axis=1).ffill().bfill()
df.columns = ['values1', 'values2', 'values3']
df
其中:
values1 values2 values3
2016-05-04 1.0 2.0 3.0
2016-06-01 1.0 2.0 3.0
2016-06-02 1.0 2.0 3.0
2016-06-03 1.0 2.0 3.0
2016-06-05 1.0 2.0 3.0
或者如果你只是想将数据框分开,这也将工作数据框是否具有单调索引。
Or if you just want the data-frames left separate, this will also work regardless of whether the data-frame has a monotonic index.
commonIndex = df1.index | df2.index | df3.index
df2.reindex(commonIndex).ffill()
编辑:
我在这里有一个代码片段来转载您的错误,但我认为它可以更好地作为自己的问题 - 所以。
I had a snippet here that reproduced your error, but I think it works better as its own question- so take a look here.
这篇关于如何调整很多数据帧的索引并在Pandas中填写各自的缺失值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!