内存错误:numpy.genfromtxt()

本文介绍了内存错误:numpy.genfromtxt()的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个50,000x5,000矩阵(浮点)文件.使用x = np.genfromtxt(readFrom, dtype=float)将文件加载到内存中时，出现以下错误消息:

I have a 50,000x5,000 matrix(float) file. when use x = np.genfromtxt(readFrom, dtype=float) to load the file into memory, I am getting the following error message:

我想将整个文件加载到内存中，因为我正在使用Scipy计算每个向量之间的欧式距离. dis = scipy.spatial.distance.euclidean(x[row1], x[row2])

I want to load the whole file into memory because I am calculating the euclidean distance between each vectors using Scipy. dis = scipy.spatial.distance.euclidean(x[row1], x[row2])

有什么有效的方法可以将巨大的矩阵文件加载到内存中.

Is there any efficient way to load a huge matrix file into memory.

谢谢.

Update:

我设法解决了这个问题.这是我的解决方案.我不确定它是高效的还是逻辑上正确的，但是对我来说效果很好:

I managed to solve the problem. Here is my solution. I am not sure whether it's efficient or logically correct but works fine for me:

x = open(readFrom, 'r').readlines()
y = np.asarray([np.array(s.split()).astype('float32') for s in x], dtype=np.float32)
....
dis = scipy.spatial.distance.euclidean(y[row1], y[row2])

请帮助我改善解决方案.

Please help me to improve my solution.

推荐答案

您实际上使用的是8字节浮点数，因为python的float对应于C的double(至少在大多数系统上):

You're actually using 8 byte floats since python's float corresponds to C's double (at least on most systems):

a=np.arange(10,dtype=float)
print(a.dtype)  #np.float64

您应将数据类型指定为np.float32.根据您的操作系统以及32bit还是64bit(以及您使用的是32bit python还是64bit python)，供numpy使用的地址空间可能小于您的4Gb，这也是一个问题

You should specify your data type as np.float32. Depending on your OS, and whether it is 32bit or 64bit, (and whether you're using 32bit python vs. 64bit python), the address space available for numpy to use could be smaller than your 4Gb which could be an issue here as well.

这篇关于内存错误:numpy.genfromtxt()的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！