我为正在处理的数据集创建了特征X和标签y。
此时,我想在其上训练随机森林分类器,但在将分类器拟合到训练数据上时遇到了ValueError:setting an array element with a sequence.
在X和y功能以及错误详细信息下面:
X:
(array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-0.00050612, -0.00057967, -0.00035985, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
3.1678758e-06, -2.4535689e-06, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
6.9306935e-07, -6.6020442e-07, 0.0000000e+00], dtype=float32),
array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
8.83421380e-05, 4.97258679e-06, 0.00000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 2.3406714e-05, 3.1186773e-05, 4.9467826e-06, ...,
1.2180173e-07, -9.2944845e-08, 0.0000000e+00], dtype=float32),
array([ 1.1845550e-06, -1.6399191e-06, 2.5565218e-06, ...,
-8.7445065e-09, 5.9859917e-09, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
5.0694009e-08, -3.4546797e-08, 0.0000000e+00], dtype=float32),
array([ 1.5591205e-07, -1.5845627e-07, 1.5362870e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
8.2463991e-09, 0.0000000e+00], dtype=float32),
array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
-1.9935460e-05, -3.4417746e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5319534e-07, 2.6521766e-07, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5055220e-08, 1.2936166e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.3387315e-05, 6.0913658e-07, -5.6471418e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.7200684e-02, 3.2272514e-02, 3.2961801e-02, ...,
-1.6286784e-06, -8.5592075e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-3.3923173e-11, 2.8026699e-11, 0.0000000e+00], dtype=float32),
array([-0.00103188, -0.00075814, -0.00051426, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 7.6278877e-07, 2.1624428e-05, 1.1150542e-05, ...,
1.8263392e-09, -1.5558380e-09, 0.0000000e+00], dtype=float32),
array([-1.2111740e-07, 6.3130176e-07, -1.8378003e-06, ...,
1.1309878e-05, 5.4562256e-06, 0.0000000e+00], dtype=float32),
array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
0. ], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.8796054e-09, 1.7431153e-08, 0.0000000e+00], dtype=float32),
array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.2051008e-05, 1.6838792e-05, 3.5639907e-05, ...,
4.5767497e-06, -1.2002213e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.0104826e-10, 1.6824393e-10, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-4.8303300e-06, -1.2008861e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.7673337e-07, 2.8604177e-07, 0.0000000e+00], dtype=float32),
array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
-0.0017666 , 0. ], dtype=float32),
array([ 3.2218946e-11, -5.5296181e-11, 8.9530647e-11, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 4.9886359e-05, 1.4642075e-04, 4.4365996e-04, ...,
6.3584002e-07, -6.2395281e-07, 0.0000000e+00], dtype=float32),
array([-3.2826196e-04, 4.5522624e-03, -8.2306744e-04, ...,
-2.2519816e-07, -6.2417300e-08, 0.0000000e+00], dtype=float32),
array([ 3.1686827e-04, 4.6282235e-04, 1.0160641e-04, ...,
-1.4605960e-05, 6.6572487e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.1763244e-09, -2.8297892e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.5870585e-07, 4.6514080e-07, -9.5607948e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 5.788035e-07, -6.493598e-07, 7.111379e-07, ..., 0.000000e+00,
0.000000e+00, 0.000000e+00], dtype=float32),
array([ 2.5118000e-04, 1.4220485e-03, 3.9536849e-04, ...,
4.5242754e-04, -3.1405249e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.1985266e-07, 2.1360799e-07, -1.1951373e-06, ...,
-1.3043609e-04, 1.2107374e-06, 0.0000000e+00], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
1.2123945e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
-1.0113516e-11, 5.1403621e-12, 0.0000000e+00], dtype=float32),
array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.3284328e-05, 7.4090644e-07, -7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.4700081e-05, 2.9454704e-05, 8.0751715e-06, ...,
1.2746801e-07, -1.6574201e-06, 0.0000000e+00], dtype=float32),
array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
4.0220186e-10, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))
下方的y
('08',
'08',
'06',
'05',
'05',
'04',
'06',
'07',
'01',
'04',
'03',
'07',
'03',
'01',
'03',
'03',
'02',
'02',
'02',
'02',
'05',
'06',
'04',
'08',
'07',
'06',
'04',
'05',
'07',
'02',
'08',
'01',
'08',
'03',
'08',
'02',
'03',
'06',
'04',
'07',
'04',
'07',
'05',
'06',
'08',
'08',
'04',
'05',
'05',
'04',
'06',
'07',
'05',
'07',
'01',
'06',
'02',
'02',
'03',
'03')
分类器的代码加上训练/测试拆分:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)
错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
1 from sklearn.tree import DecisionTreeClassifier
2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)
/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
788 sample_weight=sample_weight,
789 check_input=check_input,
--> 790 X_idx_sorted=X_idx_sorted)
791 return self
792
/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
114 random_state = check_random_state(self.random_state)
115 if check_input:
--> 116 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
117 y = check_array(y, ensure_2d=False, dtype=None)
118 if issparse(X):
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
ValueError: setting an array element with a sequence.
EDIT1:我将X和y都转换为numpy数组,但收到的错误是相同的,详细信息如下
import numpy as np
X = np.asarray(X)
y = np.asarray(y)
X.shape, y.shape
输出:
((60,), (60,))
最佳答案
看来问题出在您的X。构成它的数组之一可能具有不同的长度,这会导致您构建的元组,并在由DecisionTreeClassifier处理时由Scikit-learn转换为Numpy数组进行转换转换为字符串向量,这不是决策树函数期望处理的内容。
只需检查以下代码片段即可:
X1 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))
X2 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))
print("X1:", np.array(X1).dtype, "\nX2:", np.array(X2).dtype)
通过仅更改X2的第二个元素并加上另一个数字,就可以使X2数组变成字符串数组(对象类型)。