本文介绍了Python CTypes:如何将C函数的行输出传递给Pandas DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是如何通过ctypes将C函数中制表符分隔的输出解析为pandas DataFrame:

My question is how to parse tab-delimited output from a C function into a pandas DataFrame via ctypes:

我正在使用ctypes在C3库周围的Python3.x中编写Python包装器. C库当前执行数据库查询.我正在访问的C函数return_query()从查询中返回制表符分隔的行,并给出了文件,索引和查询字符串的路径:

I am writing a Python wrapper in Python3.x around a C library using ctypes. The C library currently does database queries. The C function I am accessing return_query() returns tab-delimited rows from a query, given the path to a file, an index, and a query-string:

int return_query(structname **output, const char *input_file,
                 const char *index, const char *query_string);

如您所见,我使用output作为存储查询中所有记录的位置,其中structname是行的结构

As you can see, I'm using output as the location to store all records from the query, whereby the structname is a struct for the rows

我还有一个打印到STDOUT的函数:

I also have a function which prints to STDOUT:

int print_query(const char *input_file,
                 const char *index, const char *query_string);

我的目标是通过ctypes访问这些函数,并将制表符分隔的行输出传递给pandas DataFrame.

My goal is to access these functions via ctypes, and pass the tab-delimited row outputs into a pandas DataFrame.

我的问题是这样

(1)我可以尝试解析print_query()的STDOUT;但是,这些查询可能会导致大的制表符分隔的DataFrame.我担心此解决方案效率不高,因为它可能无法扩展到+ 10000s行.其他问题大致涵盖了如何通过ctypes从Python中的C函数捕获STDOUT:

(1) I could try to parse the STDOUT of print_query(); however, these queries could result in large tab-delimited DataFrames. I worry this solution isn't efficient, as it might not scale to +10000s of rows. Other questions have roughly covered how to catch STDOUT from C functions in Python via ctypes:

从共享中捕获打印输出使用ctypes模块从python调用的库

(2)我可以以某种方式访问​​output并将其传递给pandas DataFrame吗?我目前不确定如何运作,例如

(2) Could I access output somehow, and pass this to a pandas DataFrame? I'm currently not sure how this would work, e.g.

import ctypes

lib = CDLL("../libshared.so")  ### reference to shared library, *.so

lib.return_query.restype = ctypes.c_char
lib.return_query.argtypes = (???, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p)

第一个参数应该是什么,我如何将其传递给可能是pandas DataFrame的东西?

What should the first argument be, and how would I pass it into something which could be a pandas DataFrame?

(3)也许最好重写C函数,这些函数将由制表符分隔的行返回到可以通过ctypes更易于访问的东西?

(3) Perhaps it would be better to re-write the C functions which return tab-delimited rows into something more accessible via ctypes?

推荐答案

我本来要发表评论,但是stackoverflow阻止了我这样做.

I was going to make a comment but stackoverflow block me from that.

1-熊猫对象传递给C函数,例如PyObject *,因此lib.return_query.argtypes =( c_types.c_void_p ,ctypes.c_char_p,ctypes.c_char_p,ctypes.c_char_p)

1- The pandas object pass to c functions like PyObject *, so lib.return_query.argtypes = (c_types.c_void_p, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p)

2-如果返回的制表符分隔的行听起来更像是 ctypes.c_char_p ,而不是lib.return_query.restype = ctypes.c_char.并且您的函数int return_query应该是char * return_query

2- If you are returning a tab-delimited rows that sounds more like ctypes.c_char_p, not lib.return_query.restype = ctypes.c_char. And your function int return_query, should be char * return_query

这些是评论和观察结果,不是完整的答案....

These are comments and observations not a full answer....

这篇关于Python CTypes:如何将C函数的行输出传递给Pandas DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-18 23:13