问题描述
我想用cython编译一个python函数,用于读取跳过一些记录的二进制文件(不需要读取整个文件然后切片,因为我将耗尽内存)。我可以想到这样的事情:
def FromFileSkip(fid,count = 1,skip = 0):
如果跳过> = 0:
data = numpy.zeros(count)
k = 0
while k< count:
try:
data [k] = numpy .fromfile(fid,count = 1,dtype = dtype)
fid.seek(skip,1)
k + = 1
除了ValueError:
data = data [:k]
break
返回数据
然后我可以使用这样的函数:
f =打开(文件名)
data = FromFileSkip(f,...
$但是,为了用cython编译函数FromFileSkip,我想定义函数中涉及的所有类型,所以fid 以及文件处理程序,我如何在cython中定义它的类型,因为它不是标准类型,例如整数。
Thanks。解决方案定义 fid
类型没有帮助,因为调用python函数仍然很昂贵。尝试使用-a标志编译您的示例以了解我的意思。但是,您可以使用低级C函数进行文件处理,以避免循环中出现python开销。为了举例,我假设数据从文件的开头开始,并且它的类型是 double
来自libc.stdio cimport *
来自stdio.h的cdef extern:
FILE * fdopen(int,const char *)
作为np
导入numpy作为np
cimport numpy作为np
DTYPE = np.double#或者你的类型是
ctypedef np.double_t DTYPE_t#或其他您的类型是
def FromFileSkip(fid,int count = 1,int skip = 0):
cdef int k
cdef FILE * cfile
cdef np.ndarray [DTYPE_t,ndim = 1] data
cdef DTYPE_t * data_ptr
cfile = fdopen(fid.fileno(),'rb')#附加流
data = np。零(count).astype(DTYPE)
data_ptr =< DTYPE_t *> data.data
#也许在这里跳过一些标题字节
#... $ b $如果fread(< void *>(data_pt),则b
在范围内(计数):
r + k),sizeof(DTYPE_t),1,cfile) 0:如果fseek(cfile,skip,SEEK_CUR):
break
返回数据
$ 0:
break
请注意, cython -a example.pyx
的输出在循环内不显示python开销。 >
I want to compile a python function with cython, for reading a binary file skipping some records (without reading the whole file and then slicing, as I would run out of memory). I can come up with something like this:
def FromFileSkip(fid, count=1, skip=0):
if skip>=0:
data = numpy.zeros(count)
k = 0
while k<count:
try:
data[k] = numpy.fromfile(fid, count=1, dtype=dtype)
fid.seek(skip, 1)
k +=1
except ValueError:
data = data[:k]
break
return data
and then I can use the function like this:
f = open(filename)
data = FromFileSkip(f,...
However, for compiling the function "FromFileSkip" with cython, I would like to define all the types involved in the function, so "fid" as well, the file handler. How can I define its type in cython, as it is not a "standard" type, e.g. an integer.Thanks.
解决方案 Defining the type of fid
won't help because calling python functions is still costly. Try compiling your example with "-a" flag to see what I mean. However, you can use low-level C functions for file handling to avoid python overhead in your loop. For the sake of example, I assumed that the data starts right from the beginning of the file and that its type is double
from libc.stdio cimport *
cdef extern from "stdio.h":
FILE *fdopen(int, const char *)
import numpy as np
cimport numpy as np
DTYPE = np.double # or whatever your type is
ctypedef np.double_t DTYPE_t # or whatever your type is
def FromFileSkip(fid, int count=1, int skip=0):
cdef int k
cdef FILE* cfile
cdef np.ndarray[DTYPE_t, ndim=1] data
cdef DTYPE_t* data_ptr
cfile = fdopen(fid.fileno(), 'rb') # attach the stream
data = np.zeros(count).astype(DTYPE)
data_ptr = <DTYPE_t*>data.data
# maybe skip some header bytes here
# ...
for k in range(count):
if fread(<void*>(data_ptr + k), sizeof(DTYPE_t), 1, cfile) < 0:
break
if fseek(cfile, skip, SEEK_CUR):
break
return data
Note that the output of cython -a example.pyx
shows no python overhead inside the loop.
这篇关于将文件句柄传递给cython函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!