我正在尝试通过hstack函数来连接2个稀疏矩阵。 xtrain_cat是DictVectorizer的输出(编码分类值),而xtrain_num是熊猫cvs文件。

    xtrain_num = sparse.csr_matrix(xtrain_num)
    print type(xtrain_num)
    print xtrain_cat.shape
    print xtrain_num.shape
    x_train_data = hstack(xtrain_cat,xtrain_num)


错误:

(1000, 2778)
<class 'scipy.sparse.csr.csr_matrix'>
<class 'scipy.sparse.csr.csr_matrix'>
(1000, 2778)
(1000, 968)
Traceback (most recent call last):
  File "D:\Projects\Zohair\Bosch\Bosch.py", line 360, in <module>
    x_train_data = hstack(xtrain_cat,xtrain_num)
  File "C:\Users\Public\Documents\anaconda2\lib\site-packages\scipy\sparse\construct.py", line 464, in hstack
    return bmat([blocks], format=format, dtype=dtype)
  File "C:\Users\Public\Documents\anaconda2\lib\site-packages\scipy\sparse\construct.py", line 547, in bmat
    raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D


有人可以识别出什么是探针

最佳答案

你应该试试:

x_train_data = hstack((xtrain_cat,xtrain_num))


It takes a sequence


  块形状兼容的稀疏矩阵序列




当我将a定义为稀疏矩阵时,我可以在省略时验证您的错误(并在添加时进行更正):

In [19]: sparse.hstack(a, a)
    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent     call last)
<ipython-input-19-7c450ab4fda0> in <module>()
----> 1 sparse.hstack(a, a)

/usr/local/lib/python2.7/dist-packages/scipy/sparse/construct.pyc in hstack(blocks, format, dtype)
    454
    455     """
--> 456     return bmat([blocks], format=format, dtype=dtype)
    457
    458

/usr/local/lib/python2.7/dist-packages/scipy/sparse/construct.pyc in     bmat(blocks, format, dtype)
    537
    538     if blocks.ndim != 2:
--> 539         raise ValueError('blocks must be 2-D')
    540
    541     M,N = blocks.shape

ValueError: blocks must be 2-D

In [20]: sparse.hstack((a, a))
Out[20]:
<3x8 sparse matrix of type '<type 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>

10-08 13:34