将文件句柄传递给cython函数

将文件句柄传递给cython函数

本文介绍了将文件句柄传递给cython函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用cython编译一个python函数,用于读取跳过一些记录的二进制文件(不需要读取整个文件然后切片,因为我将耗尽内存)。我可以想到这样的事情:

  def FromFileSkip(fid,count = 1,skip = 0):
如果跳过> = 0:
data = numpy.zeros(count)
k = 0
while k< count:
try:
data [k] = numpy .fromfile(fid,count = 1,dtype = dtype)
fid.seek(skip,1)
k + = 1
除了ValueError:
data = data [:k]
break
返回数据

然后我可以使用这样的函数:

  f =打开(文件名)
data = FromFileSkip(f,...
Thanks。

解决方案

定义 fid 类型没有帮助,因为调用python函数仍然很昂贵。尝试使用-a标志编译您的示例以了解我的意思。但是,您可以使用低级C函数进行文件处理,以避免循环中出现python开销。为了举例,我假设数据从文件的开头开始,并且它的类型是 double

 来自libc.stdio cimport * 

来自stdio.h的cdef extern:
FILE * fdopen(int,const char *)

作为np
导入numpy作为np
cimport numpy作为np

DTYPE = np.double#或者你的类型是
ctypedef np.double_t DTYPE_t#或其他您的类型是

def FromFileSkip(fid,int count = 1,int skip = 0):
cdef int k
cdef FILE * cfile
cdef np.ndarray [DTYPE_t,ndim = 1] data
cdef DTYPE_t * data_ptr

cfile = fdopen(fid.fileno(),'rb')#附加流
data = np。零(count).astype(DTYPE)
data_ptr =< DTYPE_t *> data.data

#也许在这里跳过一些标题字节
#... $ b $如果fread(< void *>(data_pt),则b
在范围内(计数):
r + k),sizeof(DTYPE_t),1,cfile) 0:如果fseek(cfile,skip,SEEK_CUR):
break

返回数据
break
请注意, cython -a example.pyx 的输出在循环内不显示python开销。

>

I want to compile a python function with cython, for reading a binary file skipping some records (without reading the whole file and then slicing, as I would run out of memory). I can come up with something like this:

    def FromFileSkip(fid, count=1, skip=0):
        if skip>=0:
            data = numpy.zeros(count)
            k = 0
            while k<count:
                try:
                    data[k] = numpy.fromfile(fid, count=1, dtype=dtype)
                    fid.seek(skip, 1)
                    k +=1
                except ValueError:
                    data = data[:k]
                    break
            return data

and then I can use the function like this:

 f = open(filename)
 data = FromFileSkip(f,...

However, for compiling the function "FromFileSkip" with cython, I would like to define all the types involved in the function, so "fid" as well, the file handler. How can I define its type in cython, as it is not a "standard" type, e.g. an integer.Thanks.

解决方案

Defining the type of fid won't help because calling python functions is still costly. Try compiling your example with "-a" flag to see what I mean. However, you can use low-level C functions for file handling to avoid python overhead in your loop. For the sake of example, I assumed that the data starts right from the beginning of the file and that its type is double

from libc.stdio cimport *

cdef extern from "stdio.h":
    FILE *fdopen(int, const char *)

import numpy as np
cimport numpy as np

DTYPE = np.double # or whatever your type is
ctypedef np.double_t DTYPE_t # or whatever your type is

def FromFileSkip(fid, int count=1, int skip=0):
    cdef int k
    cdef FILE* cfile
    cdef np.ndarray[DTYPE_t, ndim=1] data
    cdef DTYPE_t* data_ptr

    cfile = fdopen(fid.fileno(), 'rb') # attach the stream
    data = np.zeros(count).astype(DTYPE)
    data_ptr = <DTYPE_t*>data.data

    # maybe skip some header bytes here
    # ...

    for k in range(count):
        if fread(<void*>(data_ptr + k), sizeof(DTYPE_t), 1, cfile) < 0:
            break
        if fseek(cfile, skip, SEEK_CUR):
            break

    return data

Note that the output of cython -a example.pyx shows no python overhead inside the loop.

这篇关于将文件句柄传递给cython函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 05:54