我正在尝试通过hstack函数来连接2个稀疏矩阵。 xtrain_cat是DictVectorizer的输出(编码分类值),而xtrain_num是熊猫cvs文件。
xtrain_num = sparse.csr_matrix(xtrain_num)
print type(xtrain_num)
print xtrain_cat.shape
print xtrain_num.shape
x_train_data = hstack(xtrain_cat,xtrain_num)
错误:
(1000, 2778)
<class 'scipy.sparse.csr.csr_matrix'>
<class 'scipy.sparse.csr.csr_matrix'>
(1000, 2778)
(1000, 968)
Traceback (most recent call last):
File "D:\Projects\Zohair\Bosch\Bosch.py", line 360, in <module>
x_train_data = hstack(xtrain_cat,xtrain_num)
File "C:\Users\Public\Documents\anaconda2\lib\site-packages\scipy\sparse\construct.py", line 464, in hstack
return bmat([blocks], format=format, dtype=dtype)
File "C:\Users\Public\Documents\anaconda2\lib\site-packages\scipy\sparse\construct.py", line 547, in bmat
raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D
有人可以识别出什么是探针
最佳答案
你应该试试:
x_train_data = hstack((xtrain_cat,xtrain_num))
It takes a sequence:
块形状兼容的稀疏矩阵序列
当我将
a
定义为稀疏矩阵时,我可以在省略时验证您的错误(并在添加时进行更正):In [19]: sparse.hstack(a, a)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-7c450ab4fda0> in <module>()
----> 1 sparse.hstack(a, a)
/usr/local/lib/python2.7/dist-packages/scipy/sparse/construct.pyc in hstack(blocks, format, dtype)
454
455 """
--> 456 return bmat([blocks], format=format, dtype=dtype)
457
458
/usr/local/lib/python2.7/dist-packages/scipy/sparse/construct.pyc in bmat(blocks, format, dtype)
537
538 if blocks.ndim != 2:
--> 539 raise ValueError('blocks must be 2-D')
540
541 M,N = blocks.shape
ValueError: blocks must be 2-D
In [20]: sparse.hstack((a, a))
Out[20]:
<3x8 sparse matrix of type '<type 'numpy.float64'>'
with 0 stored elements in COOrdinate format>