我有以下 shapefilenetcdf file

我想从包含在 shapefile 边界内的 netcdf 文件中提取数据。

你对我如何实现这一目标有什么建议吗?

shapefile 对应于 SREX 区域 11 North Europe (NEU),netcdf 文件是 CMIP6 气候模型数据输出(ua 变量)的示例。我想要的输出必须是 netcdf 格式。

更新

到目前为止,我尝试使用 NCL 和 CDO 创建一个 netcdf 掩码,并将此掩码应用于原始 netcdf 数据集。下面是步骤(和 NCL scripts ):

#################
## remove plev dimension from netcdf file
cdo --reduce_dim -copy nc_file.nc nc_file2.nc

## convert longitude to -180, 180
cdo sellonlatbox,-180,180,-90,90 nc_file2.nc nc_file3.nc

## create mask
ncl create_nc_mask.ncl

## apply mask
cdo div nc_file3.nc shape1_mask.nc nc_file4.nc
#################

输出几乎是正确的。见下图。但是 shapefile 的南部边界(SREX 11,NEU)没有被正确捕获。所以我想生成 netcdf 掩码的 NCL 脚本有问题。

python - 从包含在 shapefile 边界内的 netcdf 文件中提取数据-LMLPHP

最佳答案

重新使用一些旧的脚本/代码,我很快想出了一个 Python 解决方案。它基本上只是遍历所有网格点,并检查每个网格点是在形状文件中的多边形内部还是外部。结果是变量 mask(带有 True/False 的数组),它可用于屏蔽您的 NetCDF 变量。

注意:这使用 Numba(所有 @jit 行)来加速代码,尽管在这种情况下这不是必需的。如果您没有 Numba,您可以将它们注释掉。

import matplotlib.pyplot as pl
import netCDF4 as nc4
import numpy as np
import fiona
from numba import jit

@jit(nopython=True, nogil=True)
def distance(x1, y1, x2, y2):
    """
    Calculate distance from (x1,y1) to (x2,y2)
    """
    return ((x1-x2)**2 + (y1-y2)**2)**0.5

@jit(nopython=True, nogil=True)
def point_is_on_line(x, y, x1, y1, x2, y2):
    """
    Check whether point (x,y) is on line (x1,y1) to (x2,y2)
    """

    d1 = distance(x,  y,  x1, y1)
    d2 = distance(x,  y,  x2, y2)
    d3 = distance(x1, y1, x2, y2)

    eps = 1e-12
    return np.abs((d1+d2)-d3) < eps

@jit(nopython=True, nogil=True)
def is_left(xp, yp, x0, y0, x1, y1):
    """
    Check whether point (xp,yp) is left of line segment ((x0,y0) to (x1,y1))
    returns:  >0 if left of line, 0 if on line, <0 if right of line
    """

    return (x1-x0) * (yp-y0) - (xp-x0) * (y1-y0)

@jit(nopython=True, nogil=True)
def is_inside(xp, yp, x_set, y_set, size):
    """
    Given location (xp,yp) and set of line segments (x_set, y_set), determine
    whether (xp,yp) is inside polygon.
    """

    # First simple check on bounds
    if (xp < x_set.min() or xp > x_set.max() or yp < y_set.min() or yp > y_set.max()):
        return False

    wn = 0
    for i in range(size-1):

        # Second check: see if point exactly on line segment:
        if point_is_on_line(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]):
            return False

        # Calculate winding number
        if (y_set[i] <= yp):
            if (y_set[i+1] > yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) > 0):
                    wn += 1
        else:
            if (y_set[i+1] <= yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) < 0):
                    wn -= 1

    if wn == 0:
        return False
    else:
        return True

@jit(nopython=True, nogil=True)
def calc_mask(mask, lon, lat, shp_lon, shp_lat):
    """
    Calculate mask of grid points which are inside `shp_lon, shp_lat`
    """

    for j in range(lat.size):
        for i in range(lon.size):
            if is_inside(lon[i], lat[j], shp_lon, shp_lat, shp_lon.size):
                mask[j,i] = True


if __name__ == '__main__':

    # Selection of time and level:
    time = 0
    plev = 0

    # Read NetCDF variables, shifting the longitudes
    # from 0-360 to -180,180, like the shape file:
    nc = nc4.Dataset('nc_file.nc')
    nc_lon = nc.variables['lon'][:]-180.
    nc_lat = nc.variables['lat'][:]
    nc_ua  = nc.variables['ua'][time,plev,:,:]

    # Read shapefile and first feature
    fc = fiona.open("shape1.shp")
    feature = next(iter(fc))

    # Extract array of lat/lon coordinates:
    coords = feature['geometry']['coordinates'][0]
    shp_lon = np.array(coords)[:,0]
    shp_lat = np.array(coords)[:,1]

    # Calculate mask
    mask = np.zeros_like(nc_ua, dtype=bool)
    calc_mask(mask, nc_lon, nc_lat, shp_lon, shp_lat)

    # Mask the data array
    nc_ua_masked = np.ma.masked_where(~mask, nc_ua)

    # Plot!
    pl.figure(figsize=(8,4))
    pl.subplot(121)
    pl.pcolormesh(nc_lon, nc_lat, nc_ua, vmin=-40, vmax=105)
    pl.xlim(-20, 50)
    pl.ylim(40, 80)

    pl.subplot(122)
    pl.pcolormesh(nc_lon, nc_lat, nc_ua_masked, vmin=-40, vmax=105)
    pl.xlim(-20, 50)
    pl.ylim(40, 80)

    pl.tight_layout()

python - 从包含在 shapefile 边界内的 netcdf 文件中提取数据-LMLPHP

编辑

要将掩码写入 NetCDF,可以使用以下方法:
nc_out = nc4.Dataset('mask.nc', 'w')
nc_out.createDimension('lon', nc_lon.size)
nc_out.createDimension('lat', nc_lat.size)

nc_mask_out = nc_out.createVariable('mask', 'i2', ('lat','lon'))
nc_lon_out = nc_out.createVariable('lon', 'f8', ('lon'))
nc_lat_out = nc_out.createVariable('lat', 'f8', ('lat'))

nc_mask_out[:,:] = mask[:,:]  # Or ~mask to reverse it
nc_lon_out[:] = nc_lon[:]     # With +180 if needed
nc_lat_out[:] = nc_lat[:]

nc_out.close()

关于python - 从包含在 shapefile 边界内的 netcdf 文件中提取数据,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60233081/

10-12 18:26