在用Cython创建小数组花费的时间量堆积如山

在用Cython创建小数组花费的时间量堆积如山

本文介绍了在用Cython创建小数组花费的时间量堆积如山的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我写了numpy的,根据一个任意分布,当我遇到这确实怪异的行为来产生随机数的新的随机数生成器:

这是test.pyx

  #cython:boundscheck =假
#cython:概括=假
导入numpy的是NP
cimport numpy的是NP
cimport用Cython高清准系统(np.ndarray [双,NDIM = 1]一,np.ndarray [双,NDIM = 1] U,R):
    返回üDEF UntypedWithLoop(A,U,R):
    CDEF INT I,J = 0
    因为我在范围内(u.shape [0]):
        J + = I
    返回ū,J高清BSReplacement(np.ndarray [双,NDIM = 1]一,np.ndarray [双,NDIM = 1] U):
    CDEF np.ndarray [np.int_t,NDIM = 1] R = np.empty(u.shape [0],DTYPE = INT)
    CDEF INT I,J = 0
    因为我在范围内(u.shape [0]):
        J =
    回归ř

setup.py

 从distutils.core进口设置
从Cython.Build进口cythonize
设置(NAME =简单用Cython FUNC,ext_modules = cythonize('test.pyx'))

分析code

 #!的/ usr / bin中/蟒蛇
从__future__进口部进口子
进口timeit导入它们之前#Compile的用Cython模块
subprocess.call(['蟒蛇','setup.py','build_ext','--inplace'])SSTR =
进口测试
进口numpy的
U = numpy.random.random(10)
一个= numpy.random.random(10)
一个= numpy.cumsum(一)
一/ =一个[-1]
R = numpy.empty(10,INT)
打印二进制搜索:创建一个数组[N],并执行N个二进制搜索来填充它:\\ n,timeit.timeit('numpy.searchsorted(A,U),SSTR)
打印二进制搜索简单更换:采用相同的ARGS为np.searchsorted同样返回一个新数组,这个执行每个单元只有一个微不足道的操作:\\ n,timeit.timeit('test.BSReplacement(A,U) ,SSTR)打印功能准系统无为,timeit.timeit('test.BareBones(A,U,R)',SSTR)
打印非类型化的投入,做n次迭代:timeit.timeit('test.UntypedWithLoop(A,U,R)',SSTR)
打印时间刚刚np.empty(),timeit.timeit('numpy.empty(10,INT),SSTR)

二分查找执行发生在 len个的(U)*日志(LEN(A))的时间来执行命令。琐碎用Cython函数接受的顺序LEN(U)运行。两种方法都返回LEN(U)的一维int数组。

然而,即使这没有计算简单的实现比numpy的库全二进制搜索需要更长的时间。 (这是写在C:https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c见PyArray_SearchSorted)

的结果是:

 二进制搜索:创建一个数组[N],并执行N个二进制搜索来填补它:
1.15157485008
二进制搜索简单更换:采用相同的ARGS为np.searchsorted同样返回一个新的数组。这个执行每个元素只有一个微不足道的操作:
3.69442796707
准系统功能无所事事:0.87496304512
非类型化的投入,做n次迭代:0.244267940521
时间刚刚np.empty()1.0983929634

为什么np.empty()步走这么多的时间?而我能做些什么来得到一个空数组我能回来吗?

C函数做到这一点,并运行了一大堆健全的检查,并使用内循环较长的算法。 (我删除了所有逻辑除循环本身来回我的例子)


更新

原来有两个明显的问题:


  1. 的np.empty(10)单独调用具有极大的相开销,因为它需要为searchsorted作出新的数组,并在其上​​执行10二进制搜索需要尽可能多的时间

  2. 刚刚宣布缓冲区语法 np.ndarray [...] 也有一个庞大的开销,这比接受无类型变量和迭代50次占用更多的时间。

50次迭代的结果:

 二进制搜索:2.45336699486
简单的更换:3.71126317978
准系统功能无所事事:0.924916028976
非类型化的投入,做n次迭代:0.316384077072
时间刚刚np.empty()1.04949498177


解决方案

有就是这一点,可能有一些有益的建议在用Cython名单上的讨论:

在后续调用该方法,虽然一般来说,我尝试分配小数组用Cython之外,通过他们,并重新使用它们。我明白,这并不总是一个选项。

I was writing a new random number generator for numpy that produces random numbers according to an arbitrary distribution when I came across this really weird behavior:

this is test.pyx

#cython: boundscheck=False
#cython: wraparound=False
import numpy as np
cimport numpy as np
cimport cython

def BareBones(np.ndarray[double, ndim=1] a,np.ndarray[double, ndim=1] u,r):
    return u

def UntypedWithLoop(a,u,r):
    cdef int i,j=0
    for i in range(u.shape[0]):
        j+=i
    return u,j

def BSReplacement(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u):
    cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int)
    cdef int i,j=0
    for i in range(u.shape[0]):
        j=i
    return r

setup.py

from distutils.core import setup
from Cython.Build import cythonize
setup(name = "simple cython func",ext_modules = cythonize('test.pyx'),)

profiling code

#!/usr/bin/python
from __future__ import division

import subprocess
import timeit

#Compile the cython modules before importing them
subprocess.call(['python', 'setup.py', 'build_ext', '--inplace'])

sstr="""
import test
import numpy
u=numpy.random.random(10)
a=numpy.random.random(10)
a=numpy.cumsum(a)
a/=a[-1]
r=numpy.empty(10,int)
"""

print "binary search: creates an array[N] and performs N binary searches to fill it:\n",timeit.timeit('numpy.searchsorted(a,u)',sstr)
print "Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:\n",timeit.timeit('test.BSReplacement(a,u)',sstr)

print "barebones function doing nothing:",timeit.timeit('test.BareBones(a,u,r)',sstr)
print "Untyped inputs and doing N iterations:",timeit.timeit('test.UntypedWithLoop(a,u,r)',sstr)
print "time for just np.empty()",timeit.timeit('numpy.empty(10,int)',sstr)

The binary search implementation takes in the order of len(u)*Log(len(a)) time to execute. The trivial cython function takes in the order of len(u) to run. Both return a 1D int array of len(u).

however, even this no computation trivial implementation takes longer than the full binary search in the numpy library. (it was written in C: https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c see PyArray_SearchSorted)

The results are:

binary search: creates an array[N] and performs N binary searches to fill it:
1.15157485008
Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:
3.69442796707
barebones function doing nothing: 0.87496304512
Untyped inputs and doing N iterations: 0.244267940521
time for just np.empty() 1.0983929634

Why is the np.empty() step taking so much time? and what can I do to get an empty array that I can return ?

The C function does this AND runs a whole bunch of sanity checks AND uses a longer algorithm in the inner loop. (i removed all the logic except the loop itself fro my example)


Update

It turns out there are two distinct problems:

  1. The np.empty(10) call alone has a ginormous overhead and takes as much time as it takes for searchsorted to make a new array AND perform 10 binary searches on it
  2. Just declaring the buffer syntax np.ndarray[...] also has a massive overhead that takes up MORE time than receiving the untyped variables AND iterating 50 times.

results for 50 iterations:

binary search: 2.45336699486
Simple replacement:3.71126317978
barebones function doing nothing: 0.924916028976
Untyped inputs and doing N iterations: 0.316384077072
time for just np.empty() 1.04949498177
解决方案

There is a discussion of this on the Cython list that might have some useful suggestions:https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM

Generally though I try to allocate small arrays outside of Cython, pass them in and re-use them in subsequent calls to the method. I understand that this is not always an option.

这篇关于在用Cython创建小数组花费的时间量堆积如山的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 10:24