我有以下矩阵:

>>> X1
shape: (2399, 39999)
type: scipy.sparse.csr.csr_matrix




>> X2
shape: (2399, 333534)
type: scipy.sparse.csr.csr_matrix




>>>X3.reshape(-1,1)
shape: (2399, 1)
type: <class 'numpy.ndarray'>


我如何在右侧连接X1和X2,以生成具有以下形状的新矩阵:(2399, 373534)。我知道可以使用scipy的hstackvstack完成。但是,当我尝试:

X_combined = sparse.hstack([X1,X2,X3.T])


但是,我得到了格式错误的最终矩阵:

ValueError: all the input array dimensions except for the concatenation axis must match exactly


因此,如何在单个矩阵中正确连接?

更新

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer(min_df=5)
X1 = count_vect.fit_transform(X)




from sklearn.feature_extraction.text import TfidfVectorizer
tdidf_vect = TfidfVectorizer()
X2 = tdidf_vect.fit_transform(X)




from hdbscan import HDBSCAN
clusterer = HDBSCAN().fit(X1)
X3 = clusterer.labels_
print(X3.shape)
print(type(X3))


然后:

在:

import scipy as sparse

X_combined = sparse.hstack([X1,X2,X3.reshape(-1,1)])


出:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-14baa47e0993> in <module>()
      5
      6
----> 7 X_combined = sparse.hstack([X1,X2,X3.reshape(-1,1)])

/usr/local/lib/python3.5/site-packages/numpy/core/shape_base.py in hstack(tup)
    284     # As a special case, dimension 0 of 1-dimensional arrays is "horizontal"
    285     if arrs[0].ndim == 1:
--> 286         return _nx.concatenate(arrs, 0)
    287     else:
    288         return _nx.concatenate(arrs, 1)

ValueError: all the input arrays must have same number of dimensions

最佳答案

问题是您的导入,应该是

from scipy import sparse


顶级scipy模块(通常不应该使用顶级scipy模块)将导入numpy函数,因此在尝试版本时:

>>> import scipy as sparse
>>> sparse.hstack
<function numpy.core.shape_base.hstack>

>>> # incorrect! Correct would be

>>> from scipy import sparse
>>> sparse.hstack
<function scipy.sparse.construct.hstack>


他们的documentation中都提到了这些:


  scipy名称空间本身仅包含从numpy导入的函数。这些功能仍然存在以实现向后兼容,但应直接从numpy导入。
  
  scipy子模块的命名空间中的所有内容都是公共的。通常,建议从子模块名称空间导入函数。

关于python - 使用hstack时矩阵格式错误?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43075235/

10-12 12:47
查看更多