问题描述
我想对DataFrame重新采样具有同时包含datetime列和某些其他键的多索引.数据框看起来像:
I want to resample a DataFrame with a multi-index containing both a datetime column and some other key. The Dataframe looks like:
import pandas as pd
from StringIO import StringIO
csv = StringIO("""ID,NAME,DATE,VAR1
1,a,03-JAN-2013,69
1,a,04-JAN-2013,77
1,a,05-JAN-2013,75
2,b,03-JAN-2013,69
2,b,04-JAN-2013,75
2,b,05-JAN-2013,72""")
df = pd.read_csv(csv, index_col=['DATE', 'ID'], parse_dates=['DATE'])
df.columns.name = 'Params'
因为仅在数据时间索引上允许重采样,所以我认为取消堆放其他索引列会有所帮助.确实确实如此,但是之后我再也不能堆叠它了.
Because resampling is only allowed on datatime indexes, i thought unstacking the other index column would help. And indeed it does, but i cant stack it again afterwards.
print df.unstack('ID').resample('W-THU')
Params VAR1
ID 1 2
DATE
2013-01-03 69 69.0
2013-01-10 76 73.5
但是堆叠'ID再次导致索引错误:
But then stacking 'ID' again results in an index-error:
print df.unstack('ID').resample('W-THU').stack('ID')
IndexError: index 0 is out of bounds for axis 0 with size 0
足够奇怪的是,我可以同时堆叠其他列级别:
Strangely enough, i can stack the other column level with both:
print df.unstack('ID').resample('W-THU').stack(0)
和
print df.unstack('ID').resample('W-THU').stack('Params')
如果我重新排序(交换)两个列级别,也会发生索引错误.有人知道如何克服这个问题吗?
The index-error also occurs if i reorder (swap) both column levels. Does anyone know how to overcome this issue?
推荐答案
该示例将非数字列"NAME"拆栈,该列被静默删除,但在重新堆叠时会引起问题.下面的代码对我有用
The example unstacks a non-numerical column 'NAME' which is silently dropped but causes problems during re-stacking. The code below worked for me
print df[['VAR1']].unstack('ID').resample('W-THU').stack('ID')
Params VAR1
DATE ID
2013-01-03 A 69.0
B 69.0
2013-01-10 A 76.0
B 73.5
这篇关于重采样多索引DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!