问题描述
我从 OpenDataCube 查询返回了一个相当大的1000 x 4000像素xr.DataArray
xy
点值的大集合(> 200,000). 我需要对数组进行采样以在每个xy
点下返回一个值,并返回内插值(例如,如果该点位于0
和1.0
像素之间的中间,则返回值应该是0.5
).
I have a reasonably large 1000 x 4000 pixel xr.DataArray
returned from an OpenDataCube query, and a large set (> 200,000) of xy
point values. I need to sample the array to return a value under each xy
point, and return interpolated values (e.g. if the point lands halfway between a 0
and a 1.0
pixel, the value returned should be 0.5
).
xr.interp
使我可以轻松地对插值进行采样,但是它返回一个庞大的矩阵,其中包含所有x
和y
值的每种组合,而不仅仅是每个xy
点本身的值.我尝试使用np.diagonal
仅提取xy
点值,但这很慢,很快会遇到内存问题,并且由于我仍然需要等待通过xr.interp
插值的每种组合,因此感觉效率很低.
xr.interp
lets me easily sample interpolated values, but it returns a huge matrix of every combination of all the x
and y
values, rather than just the values for each xy
point itself. I've tried using np.diagonal
to extract just the xy
point values, but this is slow, very quickly runs into memory issues and feels inefficient given I still need to wait for every combination of values to be interpolated via xr.interp
.
可复制的示例
(仅使用10,000个采样点(理想情况下,我需要的东西可以扩展到> 200,000或更多):
(using just 10,000 sample points (ideally, I need something that can scale to > 200,000 or more):
# Create sample array
width, height = 1000, 4000
val_array = xr.DataArray(data=np.random.randint(0, 10, size=(height, width)).astype(np.float32),
coords={'x': np.linspace(3000, 5000, width),
'y': np.linspace(-3000, -5000, height)}, dims=['y', 'x'])
# Create sample points
n = 10000
x_points = np.random.randint(3000, 5000, size=n)
y_points = np.random.randint(-5000, -3000, size=n)
当前方法
%%timeit
# ATTEMPT 1
np.diagonal(val_array.interp(x=x_points, y=y_points).squeeze().values)
32.6 s ± 1.01 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
有人知道实现这一目标的更快或更有效的内存吗?
Does anyone know of a faster or more memory efficient way to achieve this?
推荐答案
为避免整个网格,您需要引入一个新的维度.
To avoid the full grid, you need to introduce a new dimension.
x = xr.DataArray(x_points, dims='z')
y = xr.DataArray(y_points, dims='z')
val_array.interp(x=x, y=y)
将沿着新的z维为您提供一个数组:
Will give you an array just along the new z dimension:
<xarray.DataArray (z: 10000)>
array([4.368132, 2.139781, 5.693636, ..., 3.7505 , 3.713589, 2.28494 ])
Coordinates:
x (z) int64 4647 4471 4692 3942 3468 ... 3040 3993 3027 4427 3749
y (z) int64 -3744 -4074 -3634 -3289 -3221 ... -4195 -4131 -4814 -3362
Dimensions without coordinates: z
36.9 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
高级插值.
这篇关于根据大量xy点从2D数组中提取插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!