问题描述
我写了numpy的,根据一个任意分布,当我遇到这确实怪异的行为来产生随机数的新的随机数生成器:
这是test.pyx
#cython:boundscheck =假
#cython:概括=假
导入numpy的是NP
cimport numpy的是NP
cimport用Cython高清准系统(np.ndarray [双,NDIM = 1]一,np.ndarray [双,NDIM = 1] U,R):
返回üDEF UntypedWithLoop(A,U,R):
CDEF INT I,J = 0
因为我在范围内(u.shape [0]):
J + = I
返回ū,J高清BSReplacement(np.ndarray [双,NDIM = 1]一,np.ndarray [双,NDIM = 1] U):
CDEF np.ndarray [np.int_t,NDIM = 1] R = np.empty(u.shape [0],DTYPE = INT)
CDEF INT I,J = 0
因为我在范围内(u.shape [0]):
J =
回归ř
setup.py
从distutils.core进口设置
从Cython.Build进口cythonize
设置(NAME =简单用Cython FUNC,ext_modules = cythonize('test.pyx'))
分析code
#!的/ usr / bin中/蟒蛇
从__future__进口部进口子
进口timeit导入它们之前#Compile的用Cython模块
subprocess.call(['蟒蛇','setup.py','build_ext','--inplace'])SSTR =
进口测试
进口numpy的
U = numpy.random.random(10)
一个= numpy.random.random(10)
一个= numpy.cumsum(一)
一/ =一个[-1]
R = numpy.empty(10,INT)
打印二进制搜索:创建一个数组[N],并执行N个二进制搜索来填充它:\\ n,timeit.timeit('numpy.searchsorted(A,U),SSTR)
打印二进制搜索简单更换:采用相同的ARGS为np.searchsorted同样返回一个新数组,这个执行每个单元只有一个微不足道的操作:\\ n,timeit.timeit('test.BSReplacement(A,U) ,SSTR)打印功能准系统无为,timeit.timeit('test.BareBones(A,U,R)',SSTR)
打印非类型化的投入,做n次迭代:timeit.timeit('test.UntypedWithLoop(A,U,R)',SSTR)
打印时间刚刚np.empty(),timeit.timeit('numpy.empty(10,INT),SSTR)
二分查找执行发生在 len个的(U)*日志(LEN(A))
的时间来执行命令。琐碎用Cython函数接受的顺序LEN(U)
运行。两种方法都返回LEN(U)的一维int数组。
然而,即使这没有计算简单的实现比numpy的库全二进制搜索需要更长的时间。 (这是写在C:https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c见PyArray_SearchSorted)
的结果是:
二进制搜索:创建一个数组[N],并执行N个二进制搜索来填补它:
1.15157485008
二进制搜索简单更换:采用相同的ARGS为np.searchsorted同样返回一个新的数组。这个执行每个元素只有一个微不足道的操作:
3.69442796707
准系统功能无所事事:0.87496304512
非类型化的投入,做n次迭代:0.244267940521
时间刚刚np.empty()1.0983929634
为什么np.empty()步走这么多的时间?而我能做些什么来得到一个空数组我能回来吗?
C函数做到这一点,并运行了一大堆健全的检查,并使用内循环较长的算法。 (我删除了所有逻辑除循环本身来回我的例子)
更新
原来有两个明显的问题:
- 的np.empty(10)单独调用具有极大的相开销,因为它需要为searchsorted作出新的数组,并在其上执行10二进制搜索需要尽可能多的时间
- 刚刚宣布缓冲区语法
np.ndarray [...]
也有一个庞大的开销,这比接受无类型变量和迭代50次占用更多的时间。
50次迭代的结果:
二进制搜索:2.45336699486
简单的更换:3.71126317978
准系统功能无所事事:0.924916028976
非类型化的投入,做n次迭代:0.316384077072
时间刚刚np.empty()1.04949498177
有就是这一点,可能有一些有益的建议在用Cython名单上的讨论:
在后续调用该方法,虽然一般来说,我尝试分配小数组用Cython之外,通过他们,并重新使用它们。我明白,这并不总是一个选项。
I was writing a new random number generator for numpy that produces random numbers according to an arbitrary distribution when I came across this really weird behavior:
this is test.pyx
#cython: boundscheck=False
#cython: wraparound=False
import numpy as np
cimport numpy as np
cimport cython
def BareBones(np.ndarray[double, ndim=1] a,np.ndarray[double, ndim=1] u,r):
return u
def UntypedWithLoop(a,u,r):
cdef int i,j=0
for i in range(u.shape[0]):
j+=i
return u,j
def BSReplacement(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u):
cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int)
cdef int i,j=0
for i in range(u.shape[0]):
j=i
return r
setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(name = "simple cython func",ext_modules = cythonize('test.pyx'),)
profiling code
#!/usr/bin/python
from __future__ import division
import subprocess
import timeit
#Compile the cython modules before importing them
subprocess.call(['python', 'setup.py', 'build_ext', '--inplace'])
sstr="""
import test
import numpy
u=numpy.random.random(10)
a=numpy.random.random(10)
a=numpy.cumsum(a)
a/=a[-1]
r=numpy.empty(10,int)
"""
print "binary search: creates an array[N] and performs N binary searches to fill it:\n",timeit.timeit('numpy.searchsorted(a,u)',sstr)
print "Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:\n",timeit.timeit('test.BSReplacement(a,u)',sstr)
print "barebones function doing nothing:",timeit.timeit('test.BareBones(a,u,r)',sstr)
print "Untyped inputs and doing N iterations:",timeit.timeit('test.UntypedWithLoop(a,u,r)',sstr)
print "time for just np.empty()",timeit.timeit('numpy.empty(10,int)',sstr)
The binary search implementation takes in the order of len(u)*Log(len(a))
time to execute. The trivial cython function takes in the order of len(u)
to run. Both return a 1D int array of len(u).
however, even this no computation trivial implementation takes longer than the full binary search in the numpy library. (it was written in C: https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c see PyArray_SearchSorted)
The results are:
binary search: creates an array[N] and performs N binary searches to fill it:
1.15157485008
Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:
3.69442796707
barebones function doing nothing: 0.87496304512
Untyped inputs and doing N iterations: 0.244267940521
time for just np.empty() 1.0983929634
Why is the np.empty() step taking so much time? and what can I do to get an empty array that I can return ?
The C function does this AND runs a whole bunch of sanity checks AND uses a longer algorithm in the inner loop. (i removed all the logic except the loop itself fro my example)
Update
It turns out there are two distinct problems:
- The np.empty(10) call alone has a ginormous overhead and takes as much time as it takes for searchsorted to make a new array AND perform 10 binary searches on it
- Just declaring the buffer syntax
np.ndarray[...]
also has a massive overhead that takes up MORE time than receiving the untyped variables AND iterating 50 times.
results for 50 iterations:
binary search: 2.45336699486
Simple replacement:3.71126317978
barebones function doing nothing: 0.924916028976
Untyped inputs and doing N iterations: 0.316384077072
time for just np.empty() 1.04949498177
There is a discussion of this on the Cython list that might have some useful suggestions:https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM
Generally though I try to allocate small arrays outside of Cython, pass them in and re-use them in subsequent calls to the method. I understand that this is not always an option.
这篇关于在用Cython创建小数组花费的时间量堆积如山的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!