问题描述
我想加载一个大文本文件(大约1GB,具有3 * 10 ^ 6行和10-100列),作为包含字符串的2D np数组.但是,似乎numpy.loadtxt()仅将浮点数作为默认值.是否可以为整个数组指定其他数据类型?我已经尝试过以下方法,但是没有运气:
I would like to load a big text file (around 1 GB with 3*10^6 rows and 10 - 100 columns) as a 2D np-array containing strings. However, it seems like numpy.loadtxt() only takes floats as default. Is it possible to specify another data type for the entire array? I've tried the following without luck:
loadedData = np.loadtxt(address, dtype=np.str)
我收到以下错误消息:
/Library/Python/2.7/site-packages/numpy-1.8.0.dev_20224ea_20121123-py2.7-macosx-10.8-x86_64.egg/numpy/lib/npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
833 fh.close()
834
--> 835 X = np.array(X, dtype)
836 # Multicolumn data are returned with shape (1, N, M), i.e.
837 # (1, 1, M) for a single row - remove the singleton dimension there
ValueError: cannot set an array element with a sequence
有什么想法吗? (我事先不知道文件中的确切列数.)
Any ideas? (I don't know the exact number of columns in my file on beforehand.)
推荐答案
使用 genfromtxt
代替.它是比loadtxt
更通用的方法:
import numpy as np
print np.genfromtxt('col.txt',dtype='str')
使用文件col.txt
:
foo bar
cat dog
man wine
这给出了:
[['foo' 'bar']
['cat' 'dog']
['man' 'wine']]
如果您希望每一行具有相同的列数,请阅读第一行并设置属性filling_values
来修复所有丢失的行.
If you expect that each row has the same number of columns, read the first row and set the attribute filling_values
to fix any missing rows.
这篇关于使用numpy.loadtxt()将文本文件加载为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!