问题描述
我正在使用numpy.fromfile
构造一个可以传递给pandas.DataFrame
构造函数的数组
I am using numpy.fromfile
to construct an array which I can pass to the pandas.DataFrame
constructor
import numpy as np
import pandas as pd
def read_best_file(file, **kwargs):
'''
Loads best price data into a dataframe
'''
names = [ 'time', 'bid_size', 'bid_price', 'ask_size', 'ask_price' ]
formats = [ 'u8', 'i4', 'f8', 'i4', 'f8' ]
offsets = [ 0, 8, 12, 20, 24 ]
dt = np.dtype({
'names': names,
'formats': formats,
'offsets': offsets
})
return pd.DataFrame(np.fromfile(file, dt))
我想扩展此方法以处理压缩文件.
I would like to extend this method to work with gzipped files.
根据 numpy.fromfile 文档,第一个参数是文件:
According to the numpy.fromfile documentation, the first parameter is file:
file : file or str
Open file object or filename
因此,我添加了以下内容以检查gzip文件路径:
As such, I added the following to check for a gzip file path:
if isinstance(file, str) and file.endswith(".gz"):
file = gzip.open(file, "r")
但是,当我尝试通过fromfile
构造函数传递它时,会得到一个IOError
:
However, when I try pass this through the fromfile
constructor I get an IOError
:
问题:
如何用压缩文件调用numpy.fromfile
?
根据注释中的请求,显示检查gzip压缩文件的实现:
As per request in comments, showing implementation which checks for gzipped files:
def read_best_file(file, **kwargs):
'''
Loads best price data into a dataframe
'''
names = [ 'time', 'bid_size', 'bid_price', 'ask_size', 'ask_price' ]
formats = [ 'u8', 'i4', 'f8', 'i4', 'f8' ]
offsets = [ 0, 8, 12, 20, 24 ]
dt = np.dtype({
'names': names,
'formats': formats,
'offsets': offsets
})
if isinstance(file, str) and file.endswith(".gz"):
file = gzip.open(file, "r")
return pd.DataFrame(np.fromfile(file, dt))
推荐答案
open.gzip()
不会返回真正的file
对象.它是只鸭子..它走路像鸭子,听起来像鸭子,但根据numpy
并不是鸭子.因此numpy
是严格的(因为很多东西都是用较低级的C代码编写的,因此可能需要一个实际的文件描述符.)
open.gzip()
doesn't return a true file
object. It's duck one .. it walks like a duck, sounds like a duck, but isn't quite a duck per numpy
. So numpy
is being strict (since much is written in lower level C code, it might require an actual file descriptor.)
您可以从gzip.open()
调用中获取底层的file
,但这只是为您提供压缩流.
You can get the underlying file
from the gzip.open()
call, but that's just going to get you the compressed stream.
这就是我要做的:我将使用subprocess.Popen()
调用zcat
将文件作为流解压缩.
This is what I would do: I would use subprocess.Popen()
to invoke zcat
to uncompress the file as a stream.
>>> import subprocess
>>> p = subprocess.Popen(["/usr/bin/zcat", "foo.txt.gz"], stdout=subprocess.PIPE)
>>> type(p.stdout)
<type 'file'>
>>> p.stdout.read()
'hello world\n'
现在,您可以将p.stdout
作为file
对象传递给numpy
:
Now you can pass p.stdout
as a file
object to numpy
:
np.fromfile(p.stdout, ...)
这篇关于numpy:fromfile压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!