我想通过使用randomforest来预测用电量。在对数据进行调节之后,最新状态如下
X=df[['Temp(⁰C)','Araç Sayısı (adet)','Montaj V362_WH','Montaj V363_WH','Montaj_Temp','avg_humidity']]
X.head(15)
输出:
Temp(⁰C) Araç Sayısı (adet) Montaj V362_WH Montaj V363_WH Montaj_Temp avg_humidity
0 3.250000 0.0 0.0 0.0 17.500000 88.250000
1 3.500000 868.0 16.0 18.0 20.466667 82.316667
2 3.958333 774.0 18.0 18.0 21.166667 87.533333
3 6.541667 0.0 0.0 0.0 18.900000 83.916667
4 4.666667 785.0 16.0 18.0 20.416667 72.650000
5 2.458333 813.0 18.0 18.0 21.166667 73.983333
6 -0.458333 804.0 16.0 18.0 20.500000 72.150000
7 -1.041667 850.0 16.0 16.0 19.850000 76.433333
8 -0.375000 763.0 16.0 18.0 20.500000 76.583333
9 4.375000 1149.0 16.0 16.0 21.416667 84.300000
10 8.541667 0.0 0.0 0.0 21.916667 71.650000
11 6.625000 763.0 16.0 18.0 22.833333 73.733333
12 5.333333 783.0 16.0 16.0 22.166667 69.250000
13 4.708333 764.0 16.0 18.0 21.583333 66.800000
14 4.208333 813.0 16.0 16.0 20.750000 68.150000
y.head(15)
输出:
Montaj_ET_kWh/day
0 11951.0
1 41821.0
2 42534.0
3 14537.0
4 41305.0
5 42295.0
6 44923.0
7 44279.0
8 45752.0
9 44432.0
10 25786.0
11 42203.0
12 40676.0
13 39980.0
14 39404.0
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30, random_state=None)
clf = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
clf.fit(X_train, y_train['Montaj_ET_kWh/day'])
for feature in zip(feature_list, clf.feature_importances_):
print(feature)
输出值
('Temp(⁰C)', 0.11598075020423881)
('Araç Sayısı (adet)', 0.7047301384616493)
('Montaj V362_WH', 0.04065706901940535)
('Montaj V363_WH', 0.023077554218712878)
('Montaj_Temp', 0.08082006262985514)
('avg_humidity', 0.03473442546613837)
sfm = SelectFromModel(clf, threshold=0.10)
sfm.fit(X_train, y_train['Montaj_ET_kWh/day'])
for feature_list_index in sfm.get_support(indices=True):
print(feature_list[feature_list_index])
输出:
Temp(⁰C)
Araç Sayısı (adet)
X_important_train = sfm.transform(X_train)
X_important_test = sfm.transform(X_test)
clf_important = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
clf_important.fit(X_important_train, y_train)
y_test=y_test.values
y_pred = clf.predict(X_test)
y_test=y_test.reshape(-1,1)
y_pred=y_pred.reshape(-1,1)
y_test=y_test.ravel()
y_pred=y_pred.ravel()
label_encoder = LabelEncoder()
y_pred = label_encoder.fit_transform(y_pred)
y_test = label_encoder.fit_transform(y_test)
accuracy_score(y_test, y_pred)
输出:
0.010964912280701754
我不知道为什么准确性太低,我不知道哪里出错了
最佳答案
您的错误是您要求回归设置中的准确性(分类指标),这毫无意义。
在accuracy_score
documentation中(添加了强调):
sklearn.metrics.accuracy_score
(y_true,y_pred,normalize = True,sample_weight = None)
精度分类得分。
检查scikit-learn中可用的list of metrics以获取合适的回归指标(您还可以确认准确性仅用于分类);有关更多详细信息,请参见我在Accuracy Score ValueError: Can't Handle mix of binary and continuous target中的回答
关于python - 随机森林精度太低,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55442721/