我为正在处理的数据集创建了特征X和标签y。

此时,我想在其上训练随机森林分类器,但在将分类器拟合到训练数据上时遇到了ValueError:setting an array element with a sequence.

在X和y功能以及错误详细信息下面:

X:

(array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-0.00050612, -0.00057967, -0.00035985, ...,  0.        ,
         0.        ,  0.        ], dtype=float32),
 array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
         3.1678758e-06, -2.4535689e-06,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         6.9306935e-07, -6.6020442e-07,  0.0000000e+00], dtype=float32),
 array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
         8.83421380e-05,  4.97258679e-06,  0.00000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 2.3406714e-05,  3.1186773e-05,  4.9467826e-06, ...,
         1.2180173e-07, -9.2944845e-08,  0.0000000e+00], dtype=float32),
 array([ 1.1845550e-06, -1.6399191e-06,  2.5565218e-06, ...,
        -8.7445065e-09,  5.9859917e-09,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-1.3284328e-05, -7.4090644e-07,  7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         5.0694009e-08, -3.4546797e-08,  0.0000000e+00], dtype=float32),
 array([ 1.5591205e-07, -1.5845627e-07,  1.5362870e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
        8.2463991e-09, 0.0000000e+00], dtype=float32),
 array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
        -1.9935460e-05, -3.4417746e-05,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.5319534e-07,  2.6521766e-07,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.5055220e-08,  1.2936166e-08,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 1.3387315e-05,  6.0913658e-07, -5.6471418e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 1.7200684e-02,  3.2272514e-02,  3.2961801e-02, ...,
        -1.6286784e-06, -8.5592075e-07,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -3.3923173e-11,  2.8026699e-11,  0.0000000e+00], dtype=float32),
 array([-0.00103188, -0.00075814, -0.00051426, ...,  0.        ,
         0.        ,  0.        ], dtype=float32),
 array([ 7.6278877e-07,  2.1624428e-05,  1.1150542e-05, ...,
         1.8263392e-09, -1.5558380e-09,  0.0000000e+00], dtype=float32),
 array([-1.2111740e-07,  6.3130176e-07, -1.8378003e-06, ...,
         1.1309878e-05,  5.4562256e-06,  0.0000000e+00], dtype=float32),
 array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
        0.        ], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -7.8796054e-09,  1.7431153e-08,  0.0000000e+00], dtype=float32),
 array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
 array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 2.2051008e-05,  1.6838792e-05,  3.5639907e-05, ...,
         4.5767497e-06, -1.2002213e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.0104826e-10,  1.6824393e-10,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -4.8303300e-06, -1.2008861e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.7673337e-07,  2.8604177e-07,  0.0000000e+00], dtype=float32),
 array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
        -0.0017666 ,  0.        ], dtype=float32),
 array([ 3.2218946e-11, -5.5296181e-11,  8.9530647e-11, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-1.3284328e-05, -7.4090644e-07,  7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 4.9886359e-05,  1.4642075e-04,  4.4365996e-04, ...,
         6.3584002e-07, -6.2395281e-07,  0.0000000e+00], dtype=float32),
 array([-3.2826196e-04,  4.5522624e-03, -8.2306744e-04, ...,
        -2.2519816e-07, -6.2417300e-08,  0.0000000e+00], dtype=float32),
 array([ 3.1686827e-04,  4.6282235e-04,  1.0160641e-04, ...,
        -1.4605960e-05,  6.6572487e-05,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -7.1763244e-09, -2.8297892e-08,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-2.5870585e-07,  4.6514080e-07, -9.5607948e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 5.788035e-07, -6.493598e-07,  7.111379e-07, ...,  0.000000e+00,
         0.000000e+00,  0.000000e+00], dtype=float32),
 array([ 2.5118000e-04,  1.4220485e-03,  3.9536849e-04, ...,
         4.5242754e-04, -3.1405249e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 1.1985266e-07,  2.1360799e-07, -1.1951373e-06, ...,
        -1.3043609e-04,  1.2107374e-06,  0.0000000e+00], dtype=float32),
 array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
        1.2123945e-07, 0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
        -1.0113516e-11,  5.1403621e-12,  0.0000000e+00], dtype=float32),
 array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00], dtype=float32),
 array([ 1.3284328e-05,  7.4090644e-07, -7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 2.4700081e-05,  2.9454704e-05,  8.0751715e-06, ...,
         1.2746801e-07, -1.6574201e-06,  0.0000000e+00], dtype=float32),
 array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
        4.0220186e-10, 0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))


下方的y

('08',
 '08',
 '06',
 '05',
 '05',
 '04',
 '06',
 '07',
 '01',
 '04',
 '03',
 '07',
 '03',
 '01',
 '03',
 '03',
 '02',
 '02',
 '02',
 '02',
 '05',
 '06',
 '04',
 '08',
 '07',
 '06',
 '04',
 '05',
 '07',
 '02',
 '08',
 '01',
 '08',
 '03',
 '08',
 '02',
 '03',
 '06',
 '04',
 '07',
 '04',
 '07',
 '05',
 '06',
 '08',
 '08',
 '04',
 '05',
 '05',
 '04',
 '06',
 '07',
 '05',
 '07',
 '01',
 '06',
 '02',
 '02',
 '03',
 '03')


分类器的代码加上训练/测试拆分:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)


错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
      1 from sklearn.tree import DecisionTreeClassifier
      2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    788             sample_weight=sample_weight,
    789             check_input=check_input,
--> 790             X_idx_sorted=X_idx_sorted)
    791         return self
    792

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    114         random_state = check_random_state(self.random_state)
    115         if check_input:
--> 116             X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    117             y = check_array(y, ensure_2d=False, dtype=None)
    118             if issparse(X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434
    435         if ensure_2d:

ValueError: setting an array element with a sequence.


EDIT1:我将X和y都转换为numpy数组,但收到的错误是相同的,详细信息如下

import numpy as np
X = np.asarray(X)
y = np.asarray(y)


X.shape, y.shape


输出:

((60,), (60,))

最佳答案

看来问题出在您的X。构成它的数组之一可能具有不同的长度,这会导致您构建的元组,并在由DecisionTreeClassifier处理时由Scikit-learn转换为Numpy数组进行转换转换为字符串向量,这不是决策树函数期望处理的内容。

只需检查以下代码片段即可:

X1 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'))

X2 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype='float32'),
 array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'))

print("X1:", np.array(X1).dtype, "\nX2:", np.array(X2).dtype)


通过仅更改X2的第二个元素并加上另一个数字,就可以使X2数组变成字符串数组(对象类型)。

07-24 09:52
查看更多