我是一个新手,正在研究我的第一个真正的ML算法抱歉,如果这是重复的,但我找不到这样的答案。
我有以下数据帧(df):

index    Feature1  Feature2  Feature3  Target
001       01         01        03        0
002       03         03        01        1
003       03         02        02        1

我的代码如下:
data = df[['Feature1', 'Feature2', 'Feature3']]
labels = df['Target']
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size = 0.8)

clf = RandomForestClassifier().fit(X_train, y_train)

prediction_of_probability = clf.predict_proba(X_test)

我正在努力的是如何将'prediction_of_probability'恢复到数据帧中?
我知道预测不是针对原始数据框中的所有项。
提前感谢你帮助我这样的新手!

最佳答案

您可以尝试保留列车的指标并进行测试,然后按以下方式组合起来:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = df[['Feature1', 'Feature2', 'Feature3']]
labels = df['Target']
indices = df.index.values

# use the indices instead the labels to save the order of the split.

X_train, X_test,indices_train,indices_test = train_test_split(data,indices, test_size=0.33, random_state=42)

y_train, y_test = labels[indices_train],  labels[indices_test]


clf = RandomForestClassifier().fit(X_train, y_train)

prediction_of_probability = clf.predict_proba(X_test)

然后你可以把概率放入新的df_new
>>> df_new = df.copy()
>>> df_new.loc[indices_test,'pred_test'] = prediction_of_probability # clf.predict_proba(X_test)
>>> print(df_new)

   Feature1  Feature2  Feature3  Target  pred_test
1         3         3         1       1        NaN
2         3         2         2       1        NaN
0         1         1         3       0        1.0

甚至对火车的预测:
>>> df_new.loc[indices_train,'pred_train'] = clf.predict_proba(X_train)
>>> print(df_new)

   Feature1  Feature2  Feature3  Target  pred_test  pred_train
1         3         3         1       1        NaN         1.0
2         3         2         2       1        NaN         1.0
0         1         1         3       0        1.0         NaN

或者,如果您想混合train和test的概率,只需使用相同的列名(即pred)。

08-26 19:23
查看更多