本文介绍了重采样多索引DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对DataFrame重新采样具有同时包含datetime列和某些其他键的多索引.数据框看起来像:

I want to resample a DataFrame with a multi-index containing both a datetime column and some other key. The Dataframe looks like:

import pandas as pd
from StringIO import StringIO

csv = StringIO("""ID,NAME,DATE,VAR1
1,a,03-JAN-2013,69
1,a,04-JAN-2013,77
1,a,05-JAN-2013,75
2,b,03-JAN-2013,69
2,b,04-JAN-2013,75
2,b,05-JAN-2013,72""")

df = pd.read_csv(csv, index_col=['DATE', 'ID'], parse_dates=['DATE'])
df.columns.name = 'Params'

因为仅在数据时间索引上允许重采样,所以我认为取消堆放其他索引列会有所帮助.确实确实如此,但是之后我再也不能堆叠它了.

Because resampling is only allowed on datatime indexes, i thought unstacking the other index column would help. And indeed it does, but i cant stack it again afterwards.

print df.unstack('ID').resample('W-THU')

Params      VAR1
ID               1     2
DATE
2013-01-03      69  69.0
2013-01-10      76  73.5

但是堆叠'ID再次导致索引错误:

But then stacking 'ID' again results in an index-error:

print df.unstack('ID').resample('W-THU').stack('ID')

IndexError: index 0 is out of bounds for axis 0 with size 0

足够奇怪的是,我可以同时堆叠其他列级别:

Strangely enough, i can stack the other column level with both:

print df.unstack('ID').resample('W-THU').stack(0)

print df.unstack('ID').resample('W-THU').stack('Params')

如果我重新排序(交换)两个列级别,也会发生索引错误.有人知道如何克服这个问题吗?

The index-error also occurs if i reorder (swap) both column levels. Does anyone know how to overcome this issue?

推荐答案

该示例将非数字列"NAME"拆栈,该列被静默删除,但在重新堆叠时会引起问题.下面的代码对我有用

The example unstacks a non-numerical column 'NAME' which is silently dropped but causes problems during re-stacking. The code below worked for me

print df[['VAR1']].unstack('ID').resample('W-THU').stack('ID')
Params         VAR1
DATE       ID
2013-01-03 A   69.0
           B   69.0
2013-01-10 A   76.0
           B   73.5

这篇关于重采样多索引DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 01:22