问题描述
数据集包含数值和类别变量,然后我将其分为两部分:
A dataset contains numerical and categorial variables, and I split then into two parts:
cont_data = data[cont_variables].values
disc_data = data[disc_variables].values
然后我使用sklearn.preprocessing.OneHotEncoder
对分类数据进行编码,然后尝试将编码的分类数据与数值数据合并:
Then I use sklearn.preprocessing.OneHotEncoder
to encode the categorical data, and then I tried to merge the coded categorical data with the numerical data:
np.concatenate((cont_data, disc_data_coded), axis=1)
但是发生以下错误:
ValueError: all the input arrays must have same number of dimensions
我确保尺寸数相等:
print(cont_data.shape) # (24000, 35)
print(disc_data_coded.shape) # (24000, 26)
最后,我发现cont_data
是numpy array
而
>>> disc_data_coded
<24000x26 sparse matrix of type '<class 'numpy.float64'>'
with 312000 stored elements in Compressed Sparse Row format>
我将OneHotEncoder
中的参数sparse
更改为False
,一切正常.但是问题是,如何在不设置sparse=False
的情况下直接将numpy array
与sparse matrix
合并?
I changed the parameter sparse
in OneHotEncoder
to be False
, everything is OK. But the question is, how can I merge a numpy array
with a sparse matrix
directly, without setting sparse=False
?
推荐答案
稀疏矩阵不是numpy数组的子类.因此numpy
方法通常不起作用.请改用稀疏函数,例如sparse.vstack
和sparse.hstack
.但是所有输入都必须是稀疏的.
Sparse matrices are not subclasses of numpy arrays; so numpy
methods often don't work. Use sparse functions instead, such as sparse.vstack
and sparse.hstack
. But all inputs then have to be sparse.
或者先使用.toarray()
使稀疏矩阵密集,然后使用np.concatenate
.
Or make the sparse matrix dense first, with .toarray()
, and use np.concatenate
.
您想要结果稀疏还是密集?
Do you want the result to sparse or dense?
In [32]: sparse.vstack((sparse.csr_matrix(np.arange(10)),sparse.csr_matrix(np.on
...: es((3,10)))))
Out[32]:
<4x10 sparse matrix of type '<class 'numpy.float64'>'
with 39 stored elements in Compressed Sparse Row format>
In [33]: np.concatenate((sparse.csr_matrix(np.arange(10)).A,np.ones((3,10))))
Out[33]:
array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
这篇关于np.concatenate具有稀疏矩阵的numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!