python - RandomForest高OOB得分与低KFold验证得分

我一直在用泰坦尼克号数据集训练随机森林模型。
许多文章指出，我们不需要对RF分类器进行交叉验证，而很少有人说您可以使用交叉验证。我都尝试了这两种方法，但我不知道如何计算分数，并且我怀疑如果不进行交叉验证就使用我的模型是否适合。

该模型的oob得分为96.85，平均交叉验证得分为83.27 [如果我将得分设置为'f1'，则该模型的得分为74.01]

这是我的代码，

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=10, random_state=44, oob_score=True)

clf.fit(titanic[features], titanic['Survived'])

clf.score(titanic[features], titanic['Survived'])

score : 0.9685746352413019

predictors = features
clf = RandomForestClassifier(random_state=10, n_estimators=10)
clf.fit(titanic[features],titanic["Survived"])

kf = KFold(n_splits=10)

scores = cross_val_score(clf, titanic[predictors], titanic["Survived"], cv=kf)

print(scores.mean())
score : 83.27

有人可以请问这个分数吗？

谢谢！

最佳答案

clf.score不返回OOB分数，而是返回训练数据中的分数。
通过clf.oob_score_方法可以访问OOB分数。

关于python - RandomForest高OOB得分与低KFold验证得分，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/59996682/

Titanic

python - RandomForest高OOB得分与低KFold验证得分