本文介绍了从netcdf文件获取每月的每小时平均数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个netCDF文件,其时间维度包含按小时排列的2年数据.我希望将其取平均值,以获得每月每个小时的每小时平均值.我试过了:

I have a netCDF file with the time dimension containing data by the hour for 2 years. I want to average it to get an hourly average for each hour of the day for each month. I tried this:

import xarray as xr
ds = xr.open_mfdataset('ecmwf_usa_2015.nc')    
ds.groupby(['time.month', 'time.hour']).mean('time')

但我收到此错误:

*** TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension

我该如何解决?如果我这样做:

How can I fix this? If I do this:

ds.groupby('time.month', 'time.hour').mean('time')

我没有收到错误,但是结果的时间范围为12(每个月一个值),而我想要每个月的小时平均值,即12个月中的每个月24个值.此处提供数据: https://www.dropbox.com/s/yqgg80wn8bjdksy/ecmwf_usa_2015.nc?dl = 0

I do not get an error but the result has a time dimension of 12 (one value for each month), whereas I want an hourly average for each month i.e. 24 values for each of 12 months. Data is available here: https://www.dropbox.com/s/yqgg80wn8bjdksy/ecmwf_usa_2015.nc?dl=0

推荐答案

您将得到 TypeError:group必须是xarray.DataArray或xarray变量或维度的名称,因为ds. groupby()应该采用xarray数据集变量或array,您传递了变量列表.

You are getting TypeError: group must be an xarray.DataArray or the name of an xarray variable or dimension because ds.groupby() is supposed to take xarray dataset variable or array , you passed a list of variables.

按文档引用分组按文档分组并将数据集转换为splitsbins,然后应用groupby('time.hour')

Refer group by documentation group by documentation and convert dataset into splits or bins and then apply groupby('time.hour')

这是因为对一个月应用groupby,然后对一个小时或一个小时或一个小时应用groupby会聚合所有数据.如果将它们分成月份数据,则可以按每个月的均值应用分组.

您可以尝试按照文档中所述的方法进行操作:

You can try this approach as mentioned in documentation:

xarray使用与pandas相同的API支持"group by"操作 实施拆分应用合并策略:

xarray supports "group by" operations with the same API as pandas to implement the split-apply-combine strategy:

  • 将您的数据分成多个独立的组. => 使用groupby_bins
  • 将它们按月拆分
  • 对每个组应用一些功能. => 应用分组依据
  • 将您的组合并为一个数据对象. **应用聚合函数mean('time')
  • Split your data into multiple independent groups. => Split them by months using groupby_bins
  • Apply some function to each group. => apply group by
  • Combine your groups back into a single data object. **apply aggregate function mean('time')

2.将其转换为熊猫数据框并使用分组依据

警告:并非所有netcdfs都可以转换为panda数据框,转换时可能会丢失元数据.

2. convert it into pandas dataframe and use group by

Warning : Not all netcdfs are convertable to panda dataframe , there may be meta data loss while conversion.

通过df = ds.to_dataframe()将ds转换为pandas数据帧并使用 使用pandas.Grouper like

Convert ds into pandas dataframe by df = ds.to_dataframe()and use group by as you require by using pandas.Grouperlike

df.set_index('time').groupby([pd.Grouper(freq='1M'), 't2m']).mean()

注意::我看到了pandas.TimeGrouper的几个答案,但已弃用,现在必须使用pandas.Grouper.

Note : I saw couple of answers with pandas.TimeGrouper but its deprecated and one has to use pandas.Grouper now.

由于您的数据集太大,并且问题并没有使数据最少化并且需要消耗大量资源来进行处理,所以我建议看看熊猫上的这些示例

Since your data set is too big and question does not have minimized data and working on it consuming heavy resources I would suggest to look at these examples on pandas

  1. 按工作日分组
  2. 按时间分组
  3. groupby-date-range -每行依赖
  4. >组数逐月和逐行
  1. group by weekdays
  2. group by time
  3. groupby-date-range-depending-on-each-row
  4. group-and-count-rows-by-month-and-year

这篇关于从netcdf文件获取每月的每小时平均数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 03:35