我有一些水文数据,每12小时间隔有四个维度。我想使用以下代码来计算每日平均值:

>>> InNcFile = Dataset ( InputFile, 'r' )

>>> Time  = InNcFile.variables['time'][:]

>>> Latitude  = InNcFile.variables['lat'][:]

>>> Longitude = InNcFile.variables['lon'][:]

>>> ZLevel = InNcFile.variables['lvl'][:]

>>> SM = InNcFile.variables['sm'][:,:,:,:]

>>> DateTime = map ( lambda x: datetime.strptime ( x, '%Y%m%d%H%M' ), Time )

>>> df = pandas.Panel4D ( SM, labels = DateTime, items = ZLevel, major_axis = Latitude, minor_axis = Longitude )

>>> SM.shape

(21, 4, 769, 1024)

>>> df_SMoist.shape

(21, 4, 769, 1024)

>>> df_MeanSM = df_SMoist.resample ( 'D', how = 'mean', axis = 0 )

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 290, in resample
    return sampler.resample(self)
  File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/tseries/resample.py", line 83, in resample
    rs = self._resample_timestamps(obj)
  File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/tseries/resample.py", line 209, in _resample_timestamps
    grouped = obj.groupby(grouper, axis=self.axis)
  File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/panelnd.py", line 111, in func
    raise NotImplementedError
NotImplementedError


现在,如果我将SM数组设置为3个维度,并且只有一个ZLevel(即使用Panel而不是Panel4D),则可以正常工作。您能帮我找出我做错了什么吗?

谢谢。

最佳答案

Panel4D尚不具备与DataFrames一样丰富的API。你可以
通过将4维数据加载到2维中来解决此问题
DataFrame with a MultiIndex

例如,如果您的SMdateszlevellatitudelongitude外观
像这样:

import numpy as np
import pandas as pd

shape = (5,2,3,4)
SM = np.arange(np.prod(shape)).reshape(shape)
dates = pd.date_range('2000-1-1', periods=shape[0], freq='12H')
zlevel = np.arange(shape[1])
lat = np.arange(shape[2])
lng = np.arange(shape[3])


那么您可以像这样用MultiIndex构建一个DataFrame:

index = pd.MultiIndex.from_product([dates, zlevel, lat, lng])
index.names = ['dates', 'zlevel', 'lat', 'long']
df = pd.DataFrame(SM.ravel(), index=index)


要按日期重新采样,索引必须是DatetimeIndex,TimedeltaIndex或PeriodIndex,而不是MultiIndex。因此,我们需要将zlevellatlong索引级别移入列:

df = df.unstack(['zlevel', 'lat', 'long'])


现在df看起来像

In [87]: df
Out[87]:
                      0                                           ...        \
zlevel                0                                           ...     1
lat                   0                1                   2      ...     0
long                  0   1   2   3    0    1    2    3    0    1 ...     2
dates                                                             ...
2000-01-01 00:00:00   0   1   2   3    4    5    6    7    8    9 ...    14
2000-01-01 12:00:00  24  25  26  27   28   29   30   31   32   33 ...    38
2000-01-02 00:00:00  48  49  50  51   52   53   54   55   56   57 ...    62
2000-01-02 12:00:00  72  73  74  75   76   77   78   79   80   81 ...    86
2000-01-03 00:00:00  96  97  98  99  100  101  102  103  104  105 ...   110


zlevel
lat                         1                   2
long                   3    0    1    2    3    0    1    2    3
dates
2000-01-01 00:00:00   15   16   17   18   19   20   21   22   23
2000-01-01 12:00:00   39   40   41   42   43   44   45   46   47
2000-01-02 00:00:00   63   64   65   66   67   68   69   70   71
2000-01-02 12:00:00   87   88   89   90   91   92   93   94   95
2000-01-03 00:00:00  111  112  113  114  115  116  117  118  119

[5 rows x 24 columns]


现在我们可以重新采样日期了:

In [88]: df.resample('D', how='mean', axis=0)
Out[88]:
             0                                           ...                  \
zlevel       0                                           ...     1
lat          0                1                   2      ...     0         1
long         0   1   2   3    0    1    2    3    0    1 ...     2    3    0
dates                                                    ...
2000-01-01  12  13  14  15   16   17   18   19   20   21 ...    26   27   28
2000-01-02  60  61  62  63   64   65   66   67   68   69 ...    74   75   76
2000-01-03  96  97  98  99  100  101  102  103  104  105 ...   110  111  112


zlevel
lat                          2
long          1    2    3    0    1    2    3
dates
2000-01-01   29   30   31   32   33   34   35
2000-01-02   77   78   79   80   81   82   83
2000-01-03  113  114  115  116  117  118  119

[3 rows x 24 columns]

关于python - Python Pandas Panel4D重新采样,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28847571/

10-11 01:20