问题描述
如何控制用于训练森林中每棵树的子样本的大小?根据scikit-learn的文档:
How is it possible to control the size of the subsample used for the training of each tree in the forest?According to the documentation of scikit-learn:
因此 bootstrap
允许随机性,但找不到控制子样本数量的方法.
So bootstrap
allows randomness but can't find how to control the number of subsample.
推荐答案
Scikit-learn不提供此选项,但是您可以通过结合使用Tree和Bagging元分类器的(慢速)版本来轻松获得此选项:/p>从sklearn.ensemble
Scikit-learn doesn't provide this, but you can easily get this option by using (slower) version using combination of tree and bagging meta-classifier:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), max_samples=0.5)
作为一个旁注,Breiman的随机森林确实不将子样本作为参数,而是完全依赖于引导程序,因此大约(1-1/e)样本用于构建每棵树.
As a side-note, Breiman's random forest indeed doesn't consider subsample as a parameter, completely relying on bootstrap, so approximately (1 - 1 / e) of samples are used to build each tree.
这篇关于scikit学习RandomForestClassifier中的子样本大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!