问题描述
我计划在我正在编写的一些数字密集型科学代码中广泛使用xarray.到目前为止,它使代码非常优雅,但是我认为我将不得不放弃它,因为性能成本太高了.
I planned to use xarray extensively in some numerically intensive scientific code that I am writing. So far, it makes the code very elegant, but I think I will have to abandon it as the performance cost is far too high.
这里是一个示例,该示例创建两个数组,并使用xarray(具有多个索引方案)和numpy将它们的一部分相乘在一起.我使用了num_comp = 2和num_x = 10000:
Here is an example, which creates two arrays and multiplies parts of them together using xarray (with several indexing schemes), and numpy. I used num_comp=2 and num_x=10000:
Line # Hits Time Per Hit % Time Line Contents
4 @profile
5 def xr_timing(num_comp, num_x):
6 1 4112 4112.0 10.1 da1 = xr.DataArray(np.random.random([num_comp, num_x]).astype(np.float32), dims=['component', 'x'], coords={'component': ['a', 'b'], 'x': np.linspace(0, 1, num_x)})
7 1 438 438.0 1.1 da2 = da1.copy()
8 1 1398 1398.0 3.4 da2[:] = np.random.random([num_comp, num_x]).astype(np.float32)
9 1 7148 7148.0 17.6 da3 = da1.isel(component=0).drop('component') * da2.isel(component=0).drop('component')
10 1 6298 6298.0 15.5 da4 = da1[dict(component=0)].drop('component') * da2[dict(component=0)].drop('component')
11 1 7541 7541.0 18.6 da5 = da1.sel(component='a').drop('component') * da2.sel(component='a').drop('component')
12 1 7184 7184.0 17.7 da6 = da1.loc[dict(component='a')].drop('component') * da2.loc[dict(component='a')].drop('component')
13 1 6479 6479.0 16.0 da7 = da1[0, :].drop('component') * da2[0, :].drop('component')
15 @profile
16 def np_timing(num_comp, num_x):
17 1 1027 1027.0 50.2 da1 = np.random.random([num_comp, num_x]).astype(np.float32)
18 1 977 977.0 47.8 da2 = np.random.random([num_comp, num_x]).astype(np.float32)
19 1 41 41.0 2.0 da3 = da1[0, :] * da2[0, :]
最快的xarray乘法大约是numpy版本的时间的150倍.这只是我代码中的操作之一,但是我发现它们大多数都比numpy慢得多倍,这很不幸,因为xarray使代码更加清晰.我在做错什么吗?
The fastest xarray multiplication takes about 150X the time of the numpy version. This is just one of the operations in my code, but I find most of them are many times slower than the numpy equivalent, which is unfortunate as xarray makes the code so much clearer. Am I doing something wrong?
更新:即使da1 [0,:].values * da2 [0,:].values(失去了使用xarray的许多好处)也需要2464个时间单位.
Update: Even da1[0, :].values * da2[0, :].values (which loses many of the benefits of using xarray) takes 2464 time units.
我正在使用xarray 0.9.6,pandas 0.21.0,numpy 1.13.3和Python 3.5.2.
I am using xarray 0.9.6, pandas 0.21.0, numpy 1.13.3, and Python 3.5.2.
更新2:按照@Maximilian的要求,这里以num_x = 1000000重新运行:
Update 2:As requested by @Maximilian, here is a re-run with num_x=1000000:
Line # Hits Time Per Hit % Time Line Contents
# xarray
9 5 408596 81719.2 11.3 da3 = da1.isel(component=0).drop('component') * da2.isel(component=0).drop('component')
10 5 407003 81400.6 11.3 da4 = da1[dict(component=0)].drop('component') * da2[dict(component=0)].drop('component')
11 5 411248 82249.6 11.4 da5 = da1.sel(component='a').drop('component') * da2.sel(component='a').drop('component')
12 5 411730 82346.0 11.4 da6 = da1.loc[dict(component='a')].drop('component') * da2.loc[dict(component='a')].drop('component')
13 5 406757 81351.4 11.3 da7 = da1[0, :].drop('component') * da2[0, :].drop('component')
14 5 48800 9760.0 1.4 da8 = da1[0, :].values * da2[0, :].values
# numpy
20 5 37476 7495.2 2.9 da3 = da1[0, :] * da2[0, :]
性能差异已经大大降低,如预期的那样(现在仅慢了10倍),但是我仍然很高兴在文档的下一个版本中会提到该问题,因为即使这种差异也会使某些人感到惊讶./p>
The performance difference has decreased substantially, as expected (only about 10X slower now), but I am still glad that the issue will be mentioned in the next release of the documentation as even this amount of difference may surprise some people.
推荐答案
是的,这是xarray的已知限制.对于xarray,使用小型数组的性能敏感代码比NumPy慢得多.我在下一个版本的文档中为此写了一个新的部分: http://xarray.pydata.org/en/stable/Calculation.html#wrapping-custom-computation
Yes, this is a known limitation for xarray. Performance sensitive code that uses small arrays is much slower for xarray than NumPy. I wrote a new section about this in our docs for the next version:http://xarray.pydata.org/en/stable/computation.html#wrapping-custom-computation
您基本上有两个选择:
- 在未包装的阵列上编写对性能敏感的代码,然后将其包装回xarray数据结构中. Xarray v0.10具有一个新的帮助器功能(
apply_ufunc
),使此操作更容易些.如果您对此感兴趣,请参阅上面的链接. - 使用xarray/Python以外的工具进行计算.这也可能是有道理的,因为Python本身会增加大量开销.朱莉娅的 AxisArrays.jl 看起来很有趣,尽管我自己还没有尝试过.
- Write your performance sensitive code on unwrapped arrays, and then wrap them back in xarray data structures. Xarray v0.10 has a new helper function (
apply_ufunc
) that makes this a little easier. See the link above if you are interested in this. - Use something other than xarray/Python to do your computation. This could also make sense because Python itself adds significant overhead. Julia's AxisArrays.jl looks like interesting, though I haven't tried it myself.
我想方法3是用C ++重写xarray本身(例如,在 xtensor 之上),但这会涉及更多的事情!
I suppose option 3 would be to rewrite xarray itself in C++ (e.g., on top of xtensor), but that would be much more involved!
这篇关于xarray对于性能至关重要的代码而言太慢了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!