为什么scipy线性插值比最近邻插值运行得更快

为什么scipy线性插值比最近邻插值运行得更快

本文介绍了为什么scipy线性插值比最近邻插值运行得更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个例程,该例程将点数据插值到常规网格中.但是,我发现scipy的最近邻插值的执行速度几乎是我用于线性插值(scipy.interpolate.Rbf)的径向基函数的两倍慢

I've written a routine that interpolates point data onto a regular grid. However, I find that scipy's implementation of nearest neighbor interpolation performs almost twice as slow as the radial basis function I'm using for linear interpolation (scipy.interpolate.Rbf)

相关代码包括如何构造插值器

Relevant code includes how the interpolators are constructed

if interpolation_mode == 'linear':
    interpolator = scipy.interpolate.Rbf(
        point_array[:, 0], point_array[:, 1], value_array,
        function='linear', smooth=.01)
elif interpolation_mode == 'nearest':
    interpolator = scipy.interpolate.NearestNDInterpolator(
        point_array, value_array)

调用插值时

result = interpolator(col_coords.ravel(), row_coords.ravel())

我正在运行的样本具有27个输入插值值点,并且我正在将近20000 X 20000的网格上进行插值. (我正在按照内存块大小来执行此操作,所以我不会使计算机爆炸.)

The sample I'm running on has 27 input interpolant value points and I'm interpolating across nearly a 20000 X 20000 grid. (I'm doing this in memory block sizes so I'm not exploding the computer btw.)

下面是我在相关代码上运行的两个cProfile的结果.请注意,最接近的邻居方案运行406秒,而线性方案运行256秒.最接近的方案由调用scipy的kdTree控制,这似乎是合理的,但rbf的性能要好得多.有什么想法为什么或可以做些什么来使我的最接近的方案比线性方案运行得更快?

Below are the result of two cProfiles I've run on the relevant code. Note that the nearest neighbor scheme runs in 406 seconds while the linear scheme runs in 256 seconds. The nearest scheme is dominated by calls to scipy's kdTree, which seems reasonable, except the rbf outperforms it by a significant amount of time. Any ideas why or what I could do to make my nearest scheme run faster than linear?

线性运行:

     25362 function calls in 225.886 seconds

   Ordered by: internal time
   List reduced from 328 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      253  169.302    0.669  207.516    0.820 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:112(
_euclidean_norm)
      258   38.211    0.148   38.211    0.148 {method 'reduce' of 'numpy.ufunc' objects}
      252    6.069    0.024    6.069    0.024 {numpy.core._dotblas.dot}
        1    5.077    5.077  225.332  225.332 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post2
8+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
      252    1.849    0.007    2.137    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:32
85(meshgrid)
      507    1.419    0.003    1.419    0.003 {method 'flatten' of 'numpy.ndarray' objects}
     1268    1.368    0.001    1.368    0.001 {numpy.core.multiarray.array}
      252    1.018    0.004    1.018    0.004 {_gdal_array.BandRasterIONumPy}
        1    0.533    0.533  225.886  225.886 pygeoprocessing\tests\helper_driver.py:10(interpolate)
      252    0.336    0.001  216.716    0.860 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:225(
__call__)

最近的邻居跑步:

     27539 function calls in 405.624 seconds

   Ordered by: internal time
   List reduced from 309 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      252  397.806    1.579  397.822    1.579 {method 'query' of 'ckdtree.cKDTree' objects}
      252    1.875    0.007    1.881    0.007 {scipy.interpolate.interpnd._ndim_coords_from_arrays}
      252    1.831    0.007    2.101    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:3285(meshgrid)
      252    1.034    0.004  400.739    1.590 C:\Python27\lib\site-packages\scipy\interpolate\ndgriddata.py:60(__call__)
        1    1.009    1.009  405.030  405.030 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
      252    0.719    0.003    0.719    0.003 {_gdal_array.BandRasterIONumPy}
        1    0.509    0.509  405.624  405.624 pygeoprocessing\tests\helper_driver.py:10(interpolate)
      252    0.261    0.001    0.261    0.001 {numpy.core.multiarray.copyto}
       27    0.125    0.005    0.125    0.005 {_ogr.Layer_CreateFeature}
        1    0.116    0.116    0.254    0.254 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:362(_parse_point_data)

作为参考,我还包括了这两个测试用例的视觉结果.

For reference, I'm also including the visual result of these two test cases.

最近

线性

推荐答案

griddata文档中运行示例:

In [47]: def func(x, y):
          return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
   ....:
In [48]: points = np.random.rand(1000, 2)
In [49]: values = func(points[:,0], points[:,1])
In [50]: grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]

因此,我们有1000个分散点,并将在20,000个点进行内插.

So we have 1000 scattered points, and will interpolate at 20,000.

In [52]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='nearest')
10 loops, best of 3: 83.6 ms per loop

In [53]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='linear')
1 loops, best of 3: 24.6 ms per loop

In [54]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='cubic')
10 loops, best of 3: 42.7 ms per loop

以及2级插值器:

In [55]: %%timeit
rbfi = interpolate.Rbf(points[:,0],points[:,1],values)
dl = rbfi(grid_x.ravel(),grid_y.ravel())
   ....:
1 loops, best of 3: 3.89 s per loop

In [56]: %%timeit
ndi=interpolate.NearestNDInterpolator(points, values)
dl=ndi(grid_x.ravel(),grid_y.ravel())
   ....:
10 loops, best of 3: 82.6 ms per loop

In [57]: %%timeit
ldi=interpolate.LinearNDInterpolator(points, values)
dl=ldi(grid_x.ravel(),grid_y.ravel())
 ....
10 loops, best of 3: 25.1 ms per loop

griddata实际上是这两个版本的第一步.

griddata is actually a 1 step cover call for these last two versions.

griddata将其方法描述为:

nearest
return the value at the data point closest to the point of
   interpolation. See NearestNDInterpolator for more details.
   Uses scipy.spatial.cKDTree

linear
tesselate the input point set to n-dimensional simplices,
   and interpolate linearly on each simplex.
   LinearNDInterpolator details are:
      The interpolant is constructed by triangulating the
      input data with Qhull [R37], and on each triangle
      performing linear barycentric interpolation.

cubic (2-D)
return the value determined from a piecewise cubic, continuously
   differentiable (C1), and approximately curvature-minimizing
   polynomial surface. See CloughTocher2DInterpolator for more details.

在2个阶段的版本上进行的进一步测试表明,建立最近的cKTtree的速度非常快;大部分时间都花在第二次插值状态.

Further tests on the 2 stage versions shows that setting up the nearest cKTtree is very fast; most of the time is spent in the 2nd interpolation state.

另一方面,设置三角表面比线性插值要花费更长的时间.

On the other hand, setting up the triangulated surface takes longer than the linear interpolation.

我对Rbf方法了解不多,为什么这么慢呢?基本方法是如此不同,以至于用简单的手动插值方法开发的直觉并没有多大意义.

I don't know enough of the Rbf method to say why that is so much slower. The underlying methods are so different that intuitions developed with simple manual methods of interpolation don't mean much.

您的示例从较少的散布点开始,然后插值到更精细的网格上.

Your example starts with fewer scattered points, and interpolates on a much finer grid.

这篇关于为什么scipy线性插值比最近邻插值运行得更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 03:14