问题描述
此问答旨在作为关于使用 scipy 的二维(和多维)插值的规范(-ish).经常有各种多维插值方法的基本语法问题,希望能把这些都说清楚.
我有一组分散的二维数据点,我想将它们绘制为一个漂亮的表面,最好在 contourf
或 plot_surface
中使用类似的东西代码>matplotlib.pyplot.如何使用 scipy 将二维或多维数据插入到网格中?
我已经找到了 scipy.interpolate
子包,但是在使用 interp2d
或 bisplrep
或 时我不断收到错误griddata
或 RBFInterpolator
(或旧的 Rbf
).这些方法的正确语法是什么?
免责声明:我写这篇博文主要是考虑到语法和一般行为.我不熟悉所描述方法的内存和 CPU 方面,我将这个答案针对那些拥有相当小的数据集的人,因此插值的质量可以成为要考虑的主要方面.我知道在处理非常大的数据集时,性能更好的方法(即 griddata
和 RBFInterpolator
没有 neighbors
关键字参数)可能不可行.
请注意,此答案使用新的
我将首先演示这三种方法在这四种测试中的表现,然后我将详细介绍所有三种方法的语法.如果您知道应该从方法中得到什么,您可能不想浪费时间学习它的语法(看着您,interp2d
).
测试数据
为了明确起见,这里是我生成输入数据的代码.虽然在这种特定情况下,我显然知道数据背后的函数,但我只会使用它来为插值方法生成输入.我使用 numpy 是为了方便(主要是为了生成数据),但单独使用 scipy 也足够了.
将 numpy 导入为 np导入 scipy.interpolate 作为 interp# 网格生成的辅助函数def gimme_mesh(n):最小值 = -1最大值 = 1# 产生一个不对称的形状以解决换位问题返回 np.meshgrid(np.linspace(minval, maxval, n),np.linspace(minval, maxval, n + 1))# 设置底层测试函数,向量化def fun_smooth(x, y):返回 np.cos(np.pi*x) * np.sin(np.pi*y)def fun_evil(x, y):# 注意单一来源;函数在那里没有唯一的限制返回 np.where(x**2 + y**2 > 1e-10, x*y/(x**2+y**2), 0.5)# 稀疏输入网格,形状为 6x7N_sparse = 6x_sparse, y_sparse = gimme_mesh(N_sparse)z_sparse_smooth = fun_smooth(x_sparse,y_sparse)z_sparse_evil = fun_evil(x_sparse,y_sparse)# 分散的输入点,共 10^2 个(形状 (100,))N_分散 = 10rng = np.random.default_rng()x_scattered, y_scattered = rng.random((2, N_scattered**2))*2 - 1z_scattered_smooth = fun_smooth(x_scattered,y_scattered)z_scattered_evil = fun_evil(x_scattered,y_scattered)# 密集输出网格,形状为 20x21N_dense = 20x_dense, y_dense = gimme_mesh(N_dense)
平滑函数和上采样
让我们从最简单的任务开始.以下是从形状为 [6, 7]
的网格到 [20, 21]
之一的上采样如何实现平滑测试功能:
尽管这是一项简单的任务,但输出之间已经存在细微差别.乍一看,所有三个输出都是合理的.根据我们对底层函数的先验知识,有两个特征需要注意:griddata
的中间情况使数据失真最大.注意图的 y == -1
边界(最靠近 x
标签):函数应该严格为零(因为 y == -1
是平滑函数的节点线),但 griddata
不是这种情况.还要注意图的 x == -1
边界(在后面,向左):基础函数在 [-1,-0.5]
,但 griddata
输出在该区域清楚地显示了非零梯度.效果很微妙,但仍然是一种偏见.
邪恶函数和上采样
更难的任务是对我们的邪恶函数执行上采样:
三种方法之间开始出现明显差异.查看曲面图,interp2d
的输出中出现了明显的虚假极值(注意绘制曲面右侧的两个驼峰).虽然 griddata
和 RBFInterpolator
乍一看似乎产生了相似的结果,但在 [0.4, -0.4]
附近产生了底层的局部最小值功能.
然而,RBFInterpolator
在一个关键方面要优越得多:它尊重底层函数的对称性(当然,样本网格的对称性也使之成为可能).griddata
的输出打破了样本点的对称性,这在平滑情况下已经很弱了.
平滑的函数和分散的数据
大多数情况下,人们希望对分散的数据进行插值.出于这个原因,我希望这些测试更加重要.如上所示,样本点是在感兴趣的域中伪均匀地选择的.在现实场景中,每次测量可能会产生额外的噪音,您应该考虑从插入原始数据开始是否有意义.
平滑函数的输出:
现在已经有一些恐怖节目在上演.我将 interp2d
的输出剪裁到 [-1, 1]
之间,专门用于绘图,以便至少保留最少量的信息.很明显,虽然存在一些基本形状,但存在巨大的噪声区域,该方法完全失效.griddata
的第二种情况很好地再现了形状,但请注意等高线图边界处的白色区域.这是因为 griddata
仅在输入数据点的凸包内起作用(换句话说,它不执行任何外推).我保留了位于凸包外的输出点的默认 NaN 值. 考虑到这些特性,RBFInterpolator
似乎表现最好.
邪恶的功能和分散的数据
我们一直在等待的那一刻:
interp2d
放弃并不奇怪.事实上,在调用 interp2d
的过程中,你应该期待一些友好的 RuntimeWarning
抱怨无法构造样条曲线.至于其他两种方法,RBFInterpolator
似乎产生了最好的输出,即使在结果外推的域的边界附近也是如此.
那么让我按优先顺序从高到低对这三种方法说几句(这样最坏的方法最不可能被任何人阅读).
scipy.interpolate.RBFInterpolator
RBFInterpolator
类名称中的 RBF 代表径向基函数".老实说,在我开始研究这篇文章之前,我从未考虑过这种方法,但我很确定我将来会使用这些方法.
就像基于样条的方法(见后面),使用分为两步:第一步,根据输入数据创建一个可调用的RBFInterpolator
类实例,然后为给定的数据调用这个对象输出网格以获得插值结果.平滑上采样测试的示例:
import scipy.interpolate as interpsparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1) # 二维形状 (N, 2)Dense_points = np.stack([x_dense.ravel(), y_dense.ravel()], -1) # 二维形状 (N, 2)zfun_smooth_rbf = interp.RBFInterpolator(sparse_points, z_sparse_smooth.ravel(),Smoothing=0, kernel='cubic') # 显式默认平滑=0 用于插值z_dense_smooth_rbf = zfun_smooth_rbf(dense_points).reshape(x_dense.shape) # 不是真正的函数,而是一个可调用的类实例zfun_evil_rbf = interp.RBFInterpolator(sparse_points, z_sparse_evil.ravel(),Smoothing=0, kernel='cubic') # 显式默认平滑=0 用于插值z_dense_evil_rbf = zfun_evil_rbf(dense_points).reshape(x_dense.shape) # 不是真正的函数,而是一个可调用的类实例
请注意,为了使 RBFInterpolator
的 API 满意,我们必须进行一些数组构建.由于我们必须将 2d 点作为形状为 (N, 2)
的数组传递,因此我们必须展平输入网格并将两个展平的数组堆叠起来.构造的插值器也需要这种格式的查询点,结果将是形状 (N,)
的一维数组,我们必须重新整形以匹配我们的二维网格以进行绘图.由于RBFInterpolator
不对输入点的维数做任何假设,它支持任意维的插值.
所以,scipy.interpolate.RBFInterpolator
- 即使对于疯狂的输入数据也能产生表现良好的输出
- 支持更高维度的插值
- 在输入点的凸包外外推(当然外推总是一场赌博,您通常根本不应该依赖它)
- 第一步是创建一个插值器,因此在不同的输出点中对其进行评估可以减少额外的工作
- 可以有任意形状的输出点数组(而不是被限制为矩形网格,见下文)
- 更有可能保持输入数据的对称性
- 支持关键字
kernel
的多种径向函数:multiquadric
、inverse_multiquadric
、inverse_quadratic
、高斯
、linear
、cubic
、quintic
、thin_plate_spline
(默认).从 SciPy 1.7.0 开始,由于技术原因,该类不允许传递自定义可调用项,但可能会在未来版本中添加. - 可以通过增加
smoothing
参数来给出不精确的插值
RBF 内插的一个缺点是内插 N
数据点涉及对 N x N
矩阵求逆.这种二次复杂性非常迅速地破坏了大量数据点的内存需求.但是,新的 RBFInterpolator
类还支持 neighbors
关键字参数,该参数将每个径向基函数的计算限制为 k
个最近的邻居,从而减少内存需求.
scipy.interpolate.griddata
我以前最喜欢的 griddata
是用于任意维度插值的通用工具.除了为节点凸包之外的点设置单个预设值之外,它不会执行外推,但由于外推是一件非常善变和危险的事情,因此这不一定是骗局.用法示例:
sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1) # 二维形状 (N, 2)z_dense_smooth_griddata = interp.griddata(sparse_points, z_sparse_smooth.ravel(),(x_dense, y_dense), method='cubic') # 默认方法是线性的
请注意,输入数组需要与 RBFInterpolator
相同的数组转换.输入点必须在 [N, D]
维的 D
维数组中指定,或者作为一维数组的元组:
z_dense_smooth_griddata = interp.griddata((x_sparse.ravel(), y_sparse.ravel()),z_sparse_smooth.ravel(), (x_dense, y_dense), method='cubic')
可以将输出点数组指定为任意维度数组的元组(如上面的两个片段),这为我们提供了更大的灵活性.
简而言之,scipy.interpolate.griddata
- 即使对于疯狂的输入数据也能产生表现良好的输出
- 支持更高维度的插值
- 不执行外推,可以为输入点凸包外的输出设置单个值(参见
fill_value
) - 在一次调用中计算内插值,因此从头开始探测多组输出点
- 可以有任意形状的输出点
- 支持任意维度的最近邻和线性插值,1d 和 2d 三次.最近邻和线性插值分别在引擎盖下使用
NearestNDInterpolator
和LinearNDInterpolator
.1d 三次插值使用样条,2d 三次插值使用CloughTocher2DInterpolator
构造一个连续可微的分段三次插值器. - 可能违反输入数据的对称性
scipy.interpolate.interp2d
/scipy.interpolate.bisplrep
我讨论 interp2d
及其亲属的唯一原因是它的名称具有欺骗性,人们可能会尝试使用它.剧透警告:不要使用它(从 scipy 1.7.0 版开始).它已经比之前的主题更加特殊,因为它专门用于二维插值,但我怀疑这是迄今为止多元插值最常见的情况.
就语法而言,interp2d
与 RBFInterpolator
的相似之处在于它首先需要构造一个插值实例,该实例可以被调用以提供实际的插值值.然而,有一个问题:输出点必须位于矩形网格上,因此调用内插器的输入必须是跨越输出网格的一维向量,就像来自 numpy.meshgrid
:
# 提醒:x_sparse 和 y_sparse 的形状为 [6, 7] 来自 numpy.meshgridzfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind='cubic') # 默认种类为'线性'# 提醒:x_dense 和 y_dense 是来自 numpy.meshgrid 的形状 (20, 21)xvec = x_dense[0,:] # 唯一 x 值的一维数组,20 个元素yvec = y_dense[:,0] # 唯一 y 值的一维数组,21 个元素z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec, yvec) # 输出是(20, 21)形数组
使用 interp2d
时最常见的错误之一是将完整的 2d 网格放入插值调用中,这会导致内存消耗激增,并可能导致草率的 MemoryError
.
现在,interp2d
的最大问题是它通常不起作用.为了理解这一点,我们必须深入了解.事实证明,interp2d
是低级函数 bisplrep
+ bisplev
,它们又是 FITPACK 例程的包装器(用 Fortran 编写).对上一个示例的等效调用是
kind = '立方'如果种类=='线性':kx = ky = 1elif 种类 == '立方':kx = ky = 3elif 种类 == 'quintic':kx = ky = 5# bisplrep 构造一个样条表示,bisplev 在给定点评估样条bisp_smooth = interp.bisplrep(x_sparse.ravel(), y_sparse.ravel(),z_sparse_smooth.ravel(), kx=kx, ky=ky, s=0)z_dense_smooth_bisplrep = interp.bisplev(xvec, yvec, bisp_smooth).T # 注意转置
现在,关于 interp2d
的事情是:(在 scipy 1.7.0 版中)有一个很好的 interpolate/interpolate.py
中对 interp2d
的注释:
如果不是矩形网格:# TODO:surfit 真的不适合插值!self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)
实际上在interpolate/fitpack.py
中,在bisplrep
中有一些设置,最终
tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,任务, s, eps, tx, ty, nxest, nyest,wrk, lwrk1, lwrk2)
就是这样.interp2d
底层的例程并不是真的要执行插值.它们可能足以处理行为足够好的数据,但在现实情况下,您可能想要使用其他东西.
总结一下,interpolate.interp2d
- 即使使用良好的数据也可能导致伪影
- 专门用于二元问题(尽管在网格上定义的输入点有有限的
interpn
) - 执行外推
- 第一步是创建一个插值器,因此在不同的输出点中对其进行评估可以减少额外的工作
- 只能在矩形网格上产生输出,对于分散的输出,您必须在循环中调用插值器
- 支持线性、三次和五次插值
- 可能违反输入数据的对称性
我相当肯定 RBFInterpolator
的 cubic
和 linear
类基函数不会与同名的其他插值器完全对应.
这些 NaN 也是表面图看起来如此奇怪的原因:matplotlib 历来难以用适当的深度信息绘制复杂的 3d 对象.数据中的 NaN 值会混淆渲染器,因此应该在后面的表面部分被绘制在前面.这是可视化的问题,而不是插值.
I have a set of scattered two-dimensional data points, and I would like to plot them as a nice surface, preferably using something like contourf
or plot_surface
in matplotlib.pyplot
. How can I interpolate my two-dimensional or multidimensional data to a mesh using scipy?
I've found the scipy.interpolate
sub-package, but I keep getting errors when using interp2d
or bisplrep
or griddata
or RBFInterpolator
(or the older Rbf
). What is the proper syntax of these methods?
Disclaimer: I'm mostly writing this post with syntactical considerations and general behaviour in mind. I'm not familiar with the memory and CPU aspect of the methods described, and I aim this answer at those who have reasonably small sets of data, such that the quality of the interpolation can be the main aspect to consider. I am aware that when working with very large data sets, the better-performing methods (namely griddata
and RBFInterpolator
without a neighbors
keyword argument) might not be feasible.
Note that this answer uses the new RBFInterpolator
class introduced in SciPy
1.7.0. For the legacy Rbf
class see the previous version of this answer.
I'm going to compare three kinds of multi-dimensional interpolation methods (interp2d
/splines, griddata
and RBFInterpolator
). I will subject them to two kinds of interpolation tasks and two kinds of underlying functions (points from which are to be interpolated). The specific examples will demonstrate two-dimensional interpolation, but the viable methods are applicable in arbitrary dimensions. Each method provides various kinds of interpolation; in all cases I will use cubic interpolation (or something close). It's important to note that whenever you use interpolation you introduce bias compared to your raw data, and the specific methods used affect the artifacts that you will end up with. Always be aware of this, and interpolate responsibly.
The two interpolation tasks will be
- upsampling (input data is on a rectangular grid, output data is on a denser grid)
- interpolation of scattered data onto a regular grid
The two functions (over the domain [x, y] in [-1, 1]x[-1, 1]
) will be
- a smooth and friendly function:
cos(pi*x)*sin(pi*y)
; range in[-1, 1]
- an evil (and in particular, non-continuous) function:
x*y / (x^2 + y^2)
with a value of 0.5 near the origin; range in[-0.5, 0.5]
Here's how they look:
I will first demonstrate how the three methods behave under these four tests, then I'll detail the syntax of all three. If you know what you should expect from a method, you might not want to waste your time learning its syntax (looking at you, interp2d
).
Test data
For the sake of explicitness, here is the code with which I generated the input data. While in this specific case I'm obviously aware of the function underlying the data, I will only use this to generate input for the interpolation methods. I use numpy for convenience (and mostly for generating the data), but scipy alone would suffice too.
import numpy as np
import scipy.interpolate as interp
# auxiliary function for mesh generation
def gimme_mesh(n):
minval = -1
maxval = 1
# produce an asymmetric shape in order to catch issues with transpositions
return np.meshgrid(np.linspace(minval, maxval, n),
np.linspace(minval, maxval, n + 1))
# set up underlying test functions, vectorized
def fun_smooth(x, y):
return np.cos(np.pi*x) * np.sin(np.pi*y)
def fun_evil(x, y):
# watch out for singular origin; function has no unique limit there
return np.where(x**2 + y**2 > 1e-10, x*y/(x**2+y**2), 0.5)
# sparse input mesh, 6x7 in shape
N_sparse = 6
x_sparse, y_sparse = gimme_mesh(N_sparse)
z_sparse_smooth = fun_smooth(x_sparse, y_sparse)
z_sparse_evil = fun_evil(x_sparse, y_sparse)
# scattered input points, 10^2 altogether (shape (100,))
N_scattered = 10
rng = np.random.default_rng()
x_scattered, y_scattered = rng.random((2, N_scattered**2))*2 - 1
z_scattered_smooth = fun_smooth(x_scattered, y_scattered)
z_scattered_evil = fun_evil(x_scattered, y_scattered)
# dense output mesh, 20x21 in shape
N_dense = 20
x_dense, y_dense = gimme_mesh(N_dense)
Smooth function and upsampling
Let's start with the easiest task. Here's how an upsampling from a mesh of shape [6, 7]
to one of [20, 21]
works out for the smooth test function:
Even though this is a simple task, there are already subtle differences between the outputs. At a first glance all three outputs are reasonable. There are two features to note, based on our prior knowledge of the underlying function: the middle case of griddata
distorts the data most. Note the y == -1
boundary of the plot (nearest the x
label): the function should be strictly zero (since y == -1
is a nodal line for the smooth function), yet this is not the case for griddata
. Also note the x == -1
boundary of the plots (behind, to the left): the underlying function has a local maximum (implying zero gradient near the boundary) at [-1, -0.5]
, yet the griddata
output shows clearly non-zero gradient in this region. The effect is subtle, but it's a bias none the less.
Evil function and upsampling
A bit harder task is to perform upsampling on our evil function:
Clear differences are starting to show among the three methods. Looking at the surface plots, there are clear spurious extrema appearing in the output from interp2d
(note the two humps on the right side of the plotted surface). While griddata
and RBFInterpolator
seem to produce similar results at first glance, producing local minima near [0.4, -0.4]
that is absent from the underlying function.
However, there is one crucial aspect in which RBFInterpolator
is far superior: it respects the symmetry of the underlying function (which is of course also made possible by the symmetry of the sample mesh). The output from griddata
breaks the symmetry of the sample points, which is already weakly visible in the smooth case.
Smooth function and scattered data
Most often one wants to perform interpolation on scattered data. For this reason I expect these tests to be more important. As shown above, the sample points were chosen pseudo-uniformly in the domain of interest. In realistic scenarios you might have additional noise with each measurement, and you should consider whether it makes sense to interpolate your raw data to begin with.
Output for the smooth function:
Now there's already a bit of a horror show going on. I clipped the output from interp2d
to between [-1, 1]
exclusively for plotting, in order to preserve at least a minimal amount of information. It's clear that while some of the underlying shape is present, there are huge noisy regions where the method completely breaks down. The second case of griddata
reproduces the shape fairly nicely, but note the white regions at the border of the contour plot. This is due to the fact that griddata
only works inside the convex hull of the input data points (in other words, it doesn't perform any extrapolation). I kept the default NaN value for output points lying outside the convex hull. Considering these features, RBFInterpolator
seems to perform best.
Evil function and scattered data
And the moment we've all been waiting for:
It's no huge surprise that interp2d
gives up. In fact, during the call to interp2d
you should expect some friendly RuntimeWarning
s complaining about the impossibility of the spline to be constructed. As for the other two methods, RBFInterpolator
seems to produce the best output, even near the borders of the domain where the result is extrapolated.
So let me say a few words about the three methods, in decreasing order of preference (so that the worst is the least likely to be read by anybody).
scipy.interpolate.RBFInterpolator
The RBF in the name of the RBFInterpolator
class stands for "radial basis functions". To be honest I've never considered this approach until I started researching for this post, but I'm pretty sure I'll be using these in the future.
Just like the spline-based methods (see later), usage comes in two steps: first one creates a callable RBFInterpolator
class instance based on the input data, and then calls this object for a given output mesh to obtain the interpolated result. Example from the smooth upsampling test:
import scipy.interpolate as interp
sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1) # shape (N, 2) in 2d
dense_points = np.stack([x_dense.ravel(), y_dense.ravel()], -1) # shape (N, 2) in 2d
zfun_smooth_rbf = interp.RBFInterpolator(sparse_points, z_sparse_smooth.ravel(),
smoothing=0, kernel='cubic') # explicit default smoothing=0 for interpolation
z_dense_smooth_rbf = zfun_smooth_rbf(dense_points).reshape(x_dense.shape) # not really a function, but a callable class instance
zfun_evil_rbf = interp.RBFInterpolator(sparse_points, z_sparse_evil.ravel(),
smoothing=0, kernel='cubic') # explicit default smoothing=0 for interpolation
z_dense_evil_rbf = zfun_evil_rbf(dense_points).reshape(x_dense.shape) # not really a function, but a callable class instance
Note that we had to do some array building gymnastics to make the API of RBFInterpolator
happy. Since we have to pass the 2d points as arrays of shape (N, 2)
, we have to flatten the input grid and stack the two flattened arrays. The constructed interpolator also expects query points in this format, and the result will be a 1d array of shape (N,)
which we have to reshape back to match our 2d grid for plotting. Since RBFInterpolator
makes no assumptions about the number of dimensions of the input points, it supports arbitrary dimensions for interpolation.
So, scipy.interpolate.RBFInterpolator
- produces well-behaved output even for crazy input data
- supports interpolation in higher dimensions
- extrapolates outside the convex hull of the input points (of course extrapolation is always a gamble, and you should generally not rely on it at all)
- creates an interpolator as a first step, so evaluating it in various output points is less additional effort
- can have output point arrays of arbitrary shape (as opposed to being constrained to rectangular meshes, see later)
- more likely to preserving the symmetry of the input data
- supports multiple kinds of radial functions for keyword
kernel
:multiquadric
,inverse_multiquadric
,inverse_quadratic
,gaussian
,linear
,cubic
,quintic
,thin_plate_spline
(the default). As of SciPy 1.7.0 the class doesn't allow passing a custom callable due to technical reasons, but this is likely to be added in a future version. - can give inexact interpolations by increasing the
smoothing
parameter
One drawback of RBF interpolation is that interpolating N
data points involves inverting an N x N
matrix. This quadratic complexity very quickly blows up memory need for a large number of data points. However, the new RBFInterpolator
class also supports a neighbors
keyword parameter that restricts computation of each radial basis function to k
nearest neighbours, thereby reducing memory need.
scipy.interpolate.griddata
My former favourite, griddata
, is a general workhorse for interpolation in arbitrary dimensions. It doesn't perform extrapolation beyond setting a single preset value for points outside the convex hull of the nodal points, but since extrapolation is a very fickle and dangerous thing, this is not necessarily a con. Usage example:
sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1) # shape (N, 2) in 2d
z_dense_smooth_griddata = interp.griddata(sparse_points, z_sparse_smooth.ravel(),
(x_dense, y_dense), method='cubic') # default method is linear
Note that the same array transformations were necessary for the input arrays as for RBFInterpolator
. The input points have to be specified in an array of shape [N, D]
in D
dimensions, or alternatively as a tuple of 1d arrays:
z_dense_smooth_griddata = interp.griddata((x_sparse.ravel(), y_sparse.ravel()),
z_sparse_smooth.ravel(), (x_dense, y_dense), method='cubic')
The output point arrays can be specified as a tuple of arrays of arbitrary dimensions (as in both above snippets), which gives us some more flexibility.
In a nutshell, scipy.interpolate.griddata
- produces well-behaved output even for crazy input data
- supports interpolation in higher dimensions
- does not perform extrapolation, a single value can be set for the output outside the convex hull of the input points (see
fill_value
) - computes the interpolated values in a single call, so probing multiple sets of output points starts from scratch
- can have output points of arbitrary shape
- supports nearest-neighbour and linear interpolation in arbitrary dimensions, cubic in 1d and 2d. Nearest-neighbour and linear interpolation use
NearestNDInterpolator
andLinearNDInterpolator
under the hood, respectively. 1d cubic interpolation uses a spline, 2d cubic interpolation usesCloughTocher2DInterpolator
to construct a continuously differentiable piecewise-cubic interpolator. - might violate the symmetry of the input data
scipy.interpolate.interp2d
/scipy.interpolate.bisplrep
The only reason I'm discussing interp2d
and its relatives is that it has a deceptive name, and people are likely to try using it. Spoiler alert: don't use it (as of scipy version 1.7.0). It's already more special than the previous subjects in that it's specifically used for two-dimensional interpolation, but I suspect this is by far the most common case for multivariate interpolation.
As far as syntax goes, interp2d
is similar to RBFInterpolator
in that it first needs constructing an interpolation instance, which can be called to provide the actual interpolated values. There's a catch, however: the output points have to be located on a rectangular mesh, so inputs going into the call to the interpolator have to be 1d vectors which span the output grid, as if from numpy.meshgrid
:
# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid
zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind='cubic') # default kind is 'linear'
# reminder: x_dense and y_dense are of shape (20, 21) from numpy.meshgrid
xvec = x_dense[0,:] # 1d array of unique x values, 20 elements
yvec = y_dense[:,0] # 1d array of unique y values, 21 elements
z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec, yvec) # output is (20, 21)-shaped array
One of the most common mistakes when using interp2d
is putting your full 2d meshes into the interpolation call, which leads to explosive memory consumption, and hopefully to a hasty MemoryError
.
Now, the greatest problem with interp2d
is that it often doesn't work. In order to understand this, we have to look under the hood. It turns out that interp2d
is a wrapper for the lower-level functions bisplrep
+ bisplev
, which are in turn wrappers for FITPACK routines (written in Fortran). The equivalent call to the previous example would be
kind = 'cubic'
if kind == 'linear':
kx = ky = 1
elif kind == 'cubic':
kx = ky = 3
elif kind == 'quintic':
kx = ky = 5
# bisplrep constructs a spline representation, bisplev evaluates the spline at given points
bisp_smooth = interp.bisplrep(x_sparse.ravel(), y_sparse.ravel(),
z_sparse_smooth.ravel(), kx=kx, ky=ky, s=0)
z_dense_smooth_bisplrep = interp.bisplev(xvec, yvec, bisp_smooth).T # note the transpose
Now, here's the thing about interp2d
: (in scipy version 1.7.0) there is a nice comment in interpolate/interpolate.py
for interp2d
:
if not rectangular_grid:
# TODO: surfit is really not meant for interpolation!
self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)
and indeed in interpolate/fitpack.py
, in bisplrep
there's some setup and ultimately
tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,
task, s, eps, tx, ty, nxest, nyest,
wrk, lwrk1, lwrk2)
And that's it. The routines underlying interp2d
are not really meant to perform interpolation. They might suffice for sufficiently well-behaved data, but under realistic circumstances you will probably want to use something else.
Just to conclude, interpolate.interp2d
- can lead to artifacts even with well-tempered data
- is specifically for bivariate problems (although there's the limited
interpn
for input points defined on a grid) - performs extrapolation
- creates an interpolator as a first step, so evaluating it in various output points is less additional effort
- can only produce output over a rectangular grid, for scattered output you would have to call the interpolator in a loop
- supports linear, cubic and quintic interpolation
- might violate the symmetry of the input data
I'm fairly certain that the cubic
and linear
kind of basis functions of RBFInterpolator
do not exactly correspond to the other interpolators of the same name.
These NaNs are also the reason for why the surface plot seems so odd: matplotlib historically has difficulties with plotting complex 3d objects with proper depth information. The NaN values in the data confuse the renderer, so parts of the surface that should be in the back are plotted to be in the front. This is an issue with visualization, and not interpolation.
这篇关于如何使用 scipy 执行二维插值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!