问题描述
我要将一个numpy
稀疏数组(压缩)保存到一个csv中.结果是我有一个3GB的csv.问题是95%的单元格为0.0000.我使用了fmt='%5.4f'
.如何格式化和保存,使零仅保存为0而非零浮点数以'%5.4f'
格式保存?如果可以的话,我敢肯定我可以将3GB的容量降低到300MB.
I am saving a numpy
sparse array (densed) into a csv. The result is I have a 3GB csv. The problem is 95% of the cells are 0.0000. I used fmt='%5.4f'
. How can I format and save such that the zeros are saved only as 0 and the non zero floats are saved with the '%5.4f'
format ? I am sure I can get the 3GB down to 300MB if I can do this.
我正在使用
np.savetxt('foo.csv', arrayDense, fmt='%5.4f', delimiter = ',')
谢谢问候
推荐答案
如果您查看np.savetxt
的源代码,您会发现,尽管有很多代码可以处理参数和差异.在Python 2和Python 3之间,它最终是对行的简单python循环,其中每一行都经过格式化并写入文件.因此,如果您自己编写,就不会失去任何性能.例如,这是一个精简的函数,可写入紧凑的零:
If you look at the source code of np.savetxt
, you'll see that, while there is quite a bit of code to handle the arguments and the differences between Python 2 and Python 3, it is ultimately a simple python loop over the rows, in which each row is formatted and written to the file. So you won't lose any performance if you write your own. For example, here's a pared down function that writes compact zeros:
def savetxt_compact(fname, x, fmt="%.6g", delimiter=','):
with open(fname, 'w') as fh:
for row in x:
line = delimiter.join("0" if value == 0 else fmt % value for value in row)
fh.write(line + '\n')
例如:
In [70]: x
Out[70]:
array([[ 0. , 0. , 0. , 0. , 1.2345 ],
[ 0. , 9.87654321, 0. , 0. , 0. ],
[ 0. , 3.14159265, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
In [71]: savetxt_compact('foo.csv', x, fmt='%.4f')
In [72]: !cat foo.csv
0,0,0,0,1.2345
0,9.8765,0,0,0
0,3.1416,0,0,0
0,0,0,0,0
0,0,0,0,0
0,0,0,0,0
然后,只要编写自己的savetxt
函数,还可以使其处理稀疏矩阵,因此在保存之前不必将其转换为(密集)numpy数组. (我假设稀疏数组是使用scipy.sparse
中的稀疏表示之一实现的.)在以下函数中,唯一的变化是从... for value in row
到... for value in row.A[0]
.
Then, as long as you are writing your own savetxt
function, you might as well make it handle sparse matrices, so you don't have to convert it to a (dense) numpy array before saving it. (I assume the sparse array is implemented using one of the sparse representations from scipy.sparse
.) In the following function, the only change is from ... for value in row
to ... for value in row.A[0]
.
def savetxt_sparse_compact(fname, x, fmt="%.6g", delimiter=','):
with open(fname, 'w') as fh:
for row in x:
line = delimiter.join("0" if value == 0 else fmt % value for value in row.A[0])
fh.write(line + '\n')
示例:
In [112]: a
Out[112]:
<6x5 sparse matrix of type '<type 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [113]: a.A
Out[113]:
array([[ 0. , 0. , 0. , 0. , 1.2345 ],
[ 0. , 9.87654321, 0. , 0. , 0. ],
[ 0. , 3.14159265, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
In [114]: savetxt_sparse_compact('foo.csv', a, fmt='%.4f')
In [115]: !cat foo.csv
0,0,0,0,1.2345
0,9.8765,0,0,0
0,3.1416,0,0,0
0,0,0,0,0
0,0,0,0,0
0,0,0,0,0
这篇关于如何格式化numpy savetxt格式,以便零仅保存为"0".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!