问题描述
我生成一个文件NPZ如下:
I generate a npz file as follows:
import numpy as np
import os
# Generate npz file
dataset_text_filepath = 'test_np_load.npz'
texts = []
for text_number in range(30000):
texts.append(np.random.random_integers(0, 20000,
size = np.random.random_integers(0, 100)))
texts = np.array(texts)
np.savez(dataset_text_filepath, texts=texts)
这给了我这个〜7MiB NPZ文件(基本上只有1变量文本
,这是numpy的数组的数组numpy的):
This gives me this ~7MiB npz file (basically only 1 variable texts
, which is a NumPy array of Numpy arrays):
我与加载numpy.load()
:
# Load data
dataset = np.load(dataset_text_filepath)
如果我查询它如下,它需要几分钟的时间:
If I query it as follows, it takes several minutes:
# Querying data: the slow way
for i in range(20):
print('Run {0}'.format(i))
random_indices = np.random.randint(0, len(dataset['texts']), size=10)
dataset['texts'][random_indices]
而如果我查询,如下所示,它需要不到5秒:
while if I query as follows, it takes less than 5 seconds:
# Querying data: the fast way
data_texts = dataset['texts']
for i in range(20):
print('Run {0}'.format(i))
random_indices = np.random.randint(0, len(data_texts), size=10)
data_texts[random_indices]
如何而来的第二种方法是让比第一种快得多?
How comes the second method is so much faster than the first one?
推荐答案
数据['文本']
读取文件时,它每次使用。 在 NPZ
只返回一个文件加载器,而不是实际的数据。这是一个懒惰装载,访问时只加载特定的阵列。在负荷
文档可能会更清楚,但他们说:
dataset['texts']
reads the file each time it is used. load
of a npz
just returns a file loader, not the actual data. It's a 'lazy loader', loading the particular array only when accessed. The load
docs could be clearer, but they say:
- If the file is a ``.npz`` file, the returned value supports the context
manager protocol in a similar fashion to the open function::
with load('foo.npz') as data:
a = data['a']
The underlying file descriptor is closed when exiting the 'with' block.
和从 savez
:
When opening the saved ``.npz`` file with `load` a `NpzFile` object is
returned. This is a dictionary-like object which can be queried for
its list of arrays (with the ``.files`` attribute), and for the arrays
themselves.
在帮助(np.lib.npyio.NpzFile)详细信息
这篇关于查询保存numpy的阵列的numpy的数组作为NPZ是缓慢的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!