数组转储为字符串的最快方法

数组转储为字符串的最快方法

本文介绍了将 numpy 数组转储为字符串的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要组织一个包含命名数据块的数据文件.数据是 NUMPY 个数组.但我不想使用 numpy.save 或 numpy.savez 函数,因为在某些情况下,数据必须通过管道或其他接口在服务器上发送.所以我想将 numpy 数组转储到内存中,将其压缩,然后将其发送到服务器中.

我尝试过简单的泡菜,如下所示:

尝试:将 cPickle 导入为 pkl除了:将泡菜作为 pkl 导入导入 ziplib将 numpy 导入为 npdef send_to_db(数据,压缩=5):发送(zlib.compress(pkl.dumps(数据),压缩))

.. 但这是非常缓慢的过程.

即使压缩级别为 0(没有压缩),这个过程也很慢,而且只是因为酸洗.

有没有什么办法不用pickle就可以将numpy数组转储到字符串中?我知道 numpy 允许获取缓冲区 numpy.getbuffer,但我不清楚如何使用这个转储的缓冲区来获取一个数组.

解决方案

你绝对应该使用 numpy.save,你仍然可以在内存中进行:

>>>导入 io>>>将 numpy 导入为 np>>>导入 zlib>>>f = io.BytesIO()>>>arr = np.random.rand(100, 100)>>>np.save(f, arr)>>>压缩 = zlib.compress(f.getvalue())

为了解压,逆向过程:

>>>np.load(io.BytesIO(zlib.decompress(compressed)))数组([[ 0.80881898, 0.50553303, 0.03859795, ..., 0.05850996,0.9174782, 0.48671767],[ 0.79715979, 0.81465744, 0.93529834, ..., 0.53577085,0.59098735, 0.22716425],[ 0.49570713, 0.09599001, 0.74023709, ..., 0.85172897,0.05066641, 0.10364143],...,[ 0.89720137, 0.60616688, 0.62966729, ..., 0.6206728,0.96160519, 0.69746633],[ 0.59276237, 0.71586014, 0.35959289, ..., 0.46977027,0.46586237, 0.10949621],[ 0.8075795, 0.70107856, 0.81389246, ..., 0.92068768,0.38013495, 0.21489793]])>>>

如您所见,这与我们之前保存的内容相匹配:

>>>阿尔数组([[ 0.80881898, 0.50553303, 0.03859795, ..., 0.05850996,0.9174782, 0.48671767],[ 0.79715979, 0.81465744, 0.93529834, ..., 0.53577085,0.59098735, 0.22716425],[ 0.49570713, 0.09599001, 0.74023709, ..., 0.85172897,0.05066641, 0.10364143],...,[ 0.89720137, 0.60616688, 0.62966729, ..., 0.6206728,0.96160519, 0.69746633],[ 0.59276237, 0.71586014, 0.35959289, ..., 0.46977027,0.46586237, 0.10949621],[ 0.8075795, 0.70107856, 0.81389246, ..., 0.92068768,0.38013495, 0.21489793]])>>>

I need to organized a data file with chunks of named data. Data is NUMPY arrays. But I don't want to use numpy.save or numpy.savez function, because in some cases, data have to be sent on a server over a pipe or other interface. So I want to dump numpy array into memory, zip it, and then, send it into a server.

I've tried simple pickle, like this:

try:
    import cPickle as pkl
except:
    import pickle as pkl
import ziplib
import numpy as np

def send_to_db(data, compress=5):
     send( zlib.compress(pkl.dumps(data),compress) )

.. but this is extremely slow process.

Even with compress level 0 (without compression), the process is very slow and just because of pickling.

Is there any way to dump numpy array into string without pickle? I know that numpy allows to get buffer numpy.getbuffer, but it isn't obvious to me, how to use this dumped buffer to obtaine an array back.

解决方案

You should definitely use numpy.save, you can still do it in-memory:

>>> import io
>>> import numpy as np
>>> import zlib
>>> f = io.BytesIO()
>>> arr = np.random.rand(100, 100)
>>> np.save(f, arr)
>>> compressed = zlib.compress(f.getvalue())

And to decompress, reverse the process:

>>> np.load(io.BytesIO(zlib.decompress(compressed)))
array([[ 0.80881898,  0.50553303,  0.03859795, ...,  0.05850996,
         0.9174782 ,  0.48671767],
       [ 0.79715979,  0.81465744,  0.93529834, ...,  0.53577085,
         0.59098735,  0.22716425],
       [ 0.49570713,  0.09599001,  0.74023709, ...,  0.85172897,
         0.05066641,  0.10364143],
       ...,
       [ 0.89720137,  0.60616688,  0.62966729, ...,  0.6206728 ,
         0.96160519,  0.69746633],
       [ 0.59276237,  0.71586014,  0.35959289, ...,  0.46977027,
         0.46586237,  0.10949621],
       [ 0.8075795 ,  0.70107856,  0.81389246, ...,  0.92068768,
         0.38013495,  0.21489793]])
>>>

Which, as you can see, matches what we saved earlier:

>>> arr
array([[ 0.80881898,  0.50553303,  0.03859795, ...,  0.05850996,
         0.9174782 ,  0.48671767],
       [ 0.79715979,  0.81465744,  0.93529834, ...,  0.53577085,
         0.59098735,  0.22716425],
       [ 0.49570713,  0.09599001,  0.74023709, ...,  0.85172897,
         0.05066641,  0.10364143],
       ...,
       [ 0.89720137,  0.60616688,  0.62966729, ...,  0.6206728 ,
         0.96160519,  0.69746633],
       [ 0.59276237,  0.71586014,  0.35959289, ...,  0.46977027,
         0.46586237,  0.10949621],
       [ 0.8075795 ,  0.70107856,  0.81389246, ...,  0.92068768,
         0.38013495,  0.21489793]])
>>>

这篇关于将 numpy 数组转储为字符串的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 03:19