scikit学习RandomForestClassifier中的子样本大小 | cikit学习RandomForestClassifier中的子

本文介绍了scikit学习RandomForestClassifier中的子样本大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何控制用于训练森林中每棵树的子样本的大小?根据scikit-learn的文档:

How is it possible to control the size of the subsample used for the training of each tree in the forest?According to the documentation of scikit-learn:

因此 bootstrap 允许随机性，但找不到控制子样本数量的方法.

So bootstrap allows randomness but can't find how to control the number of subsample.

推荐答案

Scikit-learn不提供此选项，但是您可以通过结合使用Tree和Bagging元分类器的(慢速)版本来轻松获得此选项:/p>从sklearn.ensemble

Scikit-learn doesn't provide this, but you can easily get this option by using (slower) version using combination of tree and bagging meta-classifier:

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), max_samples=0.5)

作为一个旁注，Breiman的随机森林确实不将子样本作为参数，而是完全依赖于引导程序，因此大约(1-1/e)样本用于构建每棵树.

As a side-note, Breiman's random forest indeed doesn't consider subsample as a parameter, completely relying on bootstrap, so approximately (1 - 1 / e) of samples are used to build each tree.

这篇关于scikit学习RandomForestClassifier中的子样本大小的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！