问题描述
我有一个列表X_train
(> 20000个元素),每个元素都是由HashingVectorizer.transform()
创建的稀疏密密的csr_matrix
.
I've a list X_train
(>20000 elements) with each element being a sparse scipy csr_matrix
created by HashingVectorizer.transform()
.
我的HashingVectorizer.transform()
对输入文件进行逐行转换,并将其附加到列表X_train中.
My HashingVectorizer.transform()
does line by line transformation of the input file and appends it to the list X_train.
我正在尝试使用X_train训练SGDClassifier
,但出现错误:
I'm trying to train a SGDClassifier
using X_train but I get the error:
ValueError: setting an array element with a sequence
.
如何在无需执行CPU或内存密集型操作的情况下训练SGDClassifier?
How can I train the SGDClassifier without having to do a CPU or memory intensive operation?
推荐答案
稀疏矩阵的列表,以及将其变成数组或稀疏矩阵(或不变成稀疏矩阵)的方式:
A list of sparse matrices, and ways of turning that into an array or sparse matrix (or not):
In [916]: alist=[sparse.random(1,10,.2, format='csr') for _ in range(3)]
In [917]: alist
Out[917]:
[<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>]
制作适当的稀疏矩阵(2d):
making a proper sparse matrix (2d):
In [918]: sparse.vstack(alist)
Out[918]:
<3x10 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
矩阵的对象数组-错误
In [919]: np.array(alist)
Out[919]:
array([ <1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>], dtype=object)
试图创建一个浮点数组-您的错误
Trying to make a float array - your error
In [920]: np.array(alist, float)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-920-52d4689fa7b3> in <module>()
----> 1 np.array(alist, float)
ValueError: setting an array element with a sequence.
这篇关于使用csr_matrix列表训练SGDClassifier的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!