问题描述
我有一个netCDF文件,其时间维度包含按小时排列的2年数据.我希望将其取平均值,以获得每月每个小时的每小时平均值.我试过了:
I have a netCDF file with the time dimension containing data by the hour for 2 years. I want to average it to get an hourly average for each hour of the day for each month. I tried this:
import xarray as xr
ds = xr.open_mfdataset('ecmwf_usa_2015.nc')
ds.groupby(['time.month', 'time.hour']).mean('time')
但我收到此错误:
*** TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension
我该如何解决?如果我这样做:
How can I fix this? If I do this:
ds.groupby('time.month', 'time.hour').mean('time')
我没有收到错误,但是结果的时间范围为12(每个月一个值),而我想要每个月的小时平均值,即12个月中的每个月24个值.此处提供数据: https://www.dropbox.com/s/yqgg80wn8bjdksy/ecmwf_usa_2015.nc?dl = 0
I do not get an error but the result has a time dimension of 12 (one value for each month), whereas I want an hourly average for each month i.e. 24 values for each of 12 months. Data is available here: https://www.dropbox.com/s/yqgg80wn8bjdksy/ecmwf_usa_2015.nc?dl=0
推荐答案
您将得到 TypeError:group
必须是xarray.DataArray或xarray变量或维度的名称,因为ds. groupby()应该采用xarray数据集变量或array,您传递了变量列表.
You are getting TypeError: group
must be an xarray.DataArray or the name of an xarray variable or dimension because ds.groupby() is supposed to take xarray dataset variable or array , you passed a list of variables.
按文档引用分组按文档分组并将数据集转换为splits
或bins
,然后应用groupby('time.hour')
Refer group by documentation group by documentation and convert dataset into splits
or bins
and then apply groupby('time.hour')
这是因为对一个月应用groupby,然后对一个小时或一个小时或一个小时应用groupby会聚合所有数据.如果将它们分成月份数据,则可以按每个月的均值应用分组.
您可以尝试按照文档中所述的方法进行操作:
You can try this approach as mentioned in documentation:
xarray使用与pandas相同的API支持"group by"操作 实施拆分应用合并策略:
xarray supports "group by" operations with the same API as pandas to implement the split-apply-combine strategy:
- 将您的数据分成多个独立的组. => 使用
groupby_bins
将它们按月拆分 - 对每个组应用一些功能. => 应用分组依据
- 将您的组合并为一个数据对象. **应用聚合函数
mean('time')
- Split your data into multiple independent groups. => Split them by months using
groupby_bins
- Apply some function to each group. => apply group by
- Combine your groups back into a single data object. **apply aggregate function
mean('time')
2.将其转换为熊猫数据框并使用分组依据
警告:并非所有netcdfs都可以转换为panda数据框,转换时可能会丢失元数据.
2. convert it into pandas dataframe and use group by
Warning : Not all netcdfs are convertable to panda dataframe , there may be meta data loss while conversion.
通过df = ds.to_dataframe()
将ds转换为pandas数据帧并使用 使用pandas.Grouper
like
Convert ds into pandas dataframe by df = ds.to_dataframe()
and use group by as you require by using pandas.Grouper
like
df.set_index('time').groupby([pd.Grouper(freq='1M'), 't2m']).mean()
注意::我看到了pandas.TimeGrouper
的几个答案,但已弃用,现在必须使用pandas.Grouper
.
Note : I saw couple of answers with pandas.TimeGrouper
but its deprecated and one has to use pandas.Grouper
now.
由于您的数据集太大,并且问题并没有使数据最少化并且需要消耗大量资源来进行处理,所以我建议看看熊猫上的这些示例
Since your data set is too big and question does not have minimized data and working on it consuming heavy resources I would suggest to look at these examples on pandas
- group by weekdays
- group by time
- groupby-date-range-depending-on-each-row
- group-and-count-rows-by-month-and-year
这篇关于从netcdf文件获取每月的每小时平均数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!