我有一些水文数据,每12小时间隔有四个维度。我想使用以下代码来计算每日平均值:
>>> InNcFile = Dataset ( InputFile, 'r' )
>>> Time = InNcFile.variables['time'][:]
>>> Latitude = InNcFile.variables['lat'][:]
>>> Longitude = InNcFile.variables['lon'][:]
>>> ZLevel = InNcFile.variables['lvl'][:]
>>> SM = InNcFile.variables['sm'][:,:,:,:]
>>> DateTime = map ( lambda x: datetime.strptime ( x, '%Y%m%d%H%M' ), Time )
>>> df = pandas.Panel4D ( SM, labels = DateTime, items = ZLevel, major_axis = Latitude, minor_axis = Longitude )
>>> SM.shape
(21, 4, 769, 1024)
>>> df_SMoist.shape
(21, 4, 769, 1024)
>>> df_MeanSM = df_SMoist.resample ( 'D', how = 'mean', axis = 0 )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 290, in resample
return sampler.resample(self)
File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/tseries/resample.py", line 83, in resample
rs = self._resample_timestamps(obj)
File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/tseries/resample.py", line 209, in _resample_timestamps
grouped = obj.groupby(grouper, axis=self.axis)
File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/panelnd.py", line 111, in func
raise NotImplementedError
NotImplementedError
现在,如果我将SM数组设置为3个维度,并且只有一个ZLevel(即使用Panel而不是Panel4D),则可以正常工作。您能帮我找出我做错了什么吗?
谢谢。
最佳答案
Panel4D
尚不具备与DataFrames
一样丰富的API。你可以
通过将4维数据加载到2维中来解决此问题
DataFrame with a MultiIndex。
例如,如果您的SM
,dates
,zlevel
,latitude
和longitude
外观
像这样:
import numpy as np
import pandas as pd
shape = (5,2,3,4)
SM = np.arange(np.prod(shape)).reshape(shape)
dates = pd.date_range('2000-1-1', periods=shape[0], freq='12H')
zlevel = np.arange(shape[1])
lat = np.arange(shape[2])
lng = np.arange(shape[3])
那么您可以像这样用MultiIndex构建一个DataFrame:
index = pd.MultiIndex.from_product([dates, zlevel, lat, lng])
index.names = ['dates', 'zlevel', 'lat', 'long']
df = pd.DataFrame(SM.ravel(), index=index)
要按日期重新采样,索引必须是DatetimeIndex,TimedeltaIndex或PeriodIndex,而不是MultiIndex。因此,我们需要将
zlevel
,lat
和long
索引级别移入列:df = df.unstack(['zlevel', 'lat', 'long'])
现在
df
看起来像In [87]: df
Out[87]:
0 ... \
zlevel 0 ... 1
lat 0 1 2 ... 0
long 0 1 2 3 0 1 2 3 0 1 ... 2
dates ...
2000-01-01 00:00:00 0 1 2 3 4 5 6 7 8 9 ... 14
2000-01-01 12:00:00 24 25 26 27 28 29 30 31 32 33 ... 38
2000-01-02 00:00:00 48 49 50 51 52 53 54 55 56 57 ... 62
2000-01-02 12:00:00 72 73 74 75 76 77 78 79 80 81 ... 86
2000-01-03 00:00:00 96 97 98 99 100 101 102 103 104 105 ... 110
zlevel
lat 1 2
long 3 0 1 2 3 0 1 2 3
dates
2000-01-01 00:00:00 15 16 17 18 19 20 21 22 23
2000-01-01 12:00:00 39 40 41 42 43 44 45 46 47
2000-01-02 00:00:00 63 64 65 66 67 68 69 70 71
2000-01-02 12:00:00 87 88 89 90 91 92 93 94 95
2000-01-03 00:00:00 111 112 113 114 115 116 117 118 119
[5 rows x 24 columns]
现在我们可以重新采样日期了:
In [88]: df.resample('D', how='mean', axis=0)
Out[88]:
0 ... \
zlevel 0 ... 1
lat 0 1 2 ... 0 1
long 0 1 2 3 0 1 2 3 0 1 ... 2 3 0
dates ...
2000-01-01 12 13 14 15 16 17 18 19 20 21 ... 26 27 28
2000-01-02 60 61 62 63 64 65 66 67 68 69 ... 74 75 76
2000-01-03 96 97 98 99 100 101 102 103 104 105 ... 110 111 112
zlevel
lat 2
long 1 2 3 0 1 2 3
dates
2000-01-01 29 30 31 32 33 34 35
2000-01-02 77 78 79 80 81 82 83
2000-01-03 113 114 115 116 117 118 119
[3 rows x 24 columns]
关于python - Python Pandas Panel4D重新采样,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28847571/