问题描述
我可以将 AdaBoost 与随机森林一起用作基本分类器吗?我在互联网上搜索,我没有找到任何人这样做.
就像下面的代码;我尝试运行它,但需要很多时间:
estimators = Pipeline([('vectorizer', CountVectorizer()),('变压器', TfidfTransformer()),('分类器', AdaBoostClassifier(learning_rate=1))])RF=RandomForestClassifier(criterion='entropy',n_estimators=100,max_depth=500,min_samples_split=100,max_leaf_nodes=None,max_features='log2')参数网格={'vectorizer__ngram_range': [(1,2),(1,3)],'vectorizer__min_df':[5],'vectorizer__max_df':[0.7],'vectorizer__max_features':[1500],'transformer__use_idf':[真,假],'transformer__norm': ('l1','l2'),'transformer__smooth_idf':[真,假],'transformer__sublinear_tf':[真,假],'classifier__base_estimator':[RF],'分类器__算法':(SAMME.R",SAMME"),'classifier__n_estimators':[4,7,11,13,16,19,22,25,28,31,34,43,50]}
我尝试使用 GridSearchCV,将 RF 分类器添加到 AdaBoost 参数中.如果我使用它会提高准确性吗?
难怪你实际上没有看到有人这样做 - 这是一个荒谬和糟糕的主意.
您正在尝试构建一个集成 (Adaboost),它本身由集成基分类器 (RF) 组成——本质上是一个集成平方";所以,难怪计算时间长.
但即使它是实用的,也有很好的理论上理由不这样做;引用我自己在 AdaBoost 与 SVM 的执行时间中的回答基本分类器:
Adaboost(和类似的集成方法)是使用决策树作为基本分类器(更具体地说,决策树桩,即深度仅为 1 的 DT)构思的;今天仍然有充分的理由,如果您没有明确指定 base_classifier
参数,它会假定 DecisionTreeClassifier(max_depth=1)
的值.DTs 适用于这种集成,因为它们本质上是不稳定分类器,而 SVM 并非如此,因此当用作基分类器时,预计后者不会提供太多.
最重要的是,SVM 在计算上比决策树(更不用说决策树桩)要昂贵得多,这就是您观察到的处理时间长的原因.
这个论点也适用于 RFs - 它们不是不稳定分类器,因此在将它们用作增强算法(如 Adaboost)的基本分类器时,没有任何理由实际期望性能改进.>
Can I use AdaBoost with random forest as a base classifier? I searched on the internet and I didn't find anyone who does it.
Like in the following code; I try to run it but it takes a lot of time:
estimators = Pipeline([('vectorizer', CountVectorizer()),
('transformer', TfidfTransformer()),
('classifier', AdaBoostClassifier(learning_rate=1))])
RF=RandomForestClassifier(criterion='entropy',n_estimators=100,max_depth=500,min_samples_split=100,max_leaf_nodes=None,
max_features='log2')
param_grid={
'vectorizer__ngram_range': [(1,2),(1,3)],
'vectorizer__min_df': [5],
'vectorizer__max_df': [0.7],
'vectorizer__max_features': [1500],
'transformer__use_idf': [True , False],
'transformer__norm': ('l1','l2'),
'transformer__smooth_idf': [True , False],
'transformer__sublinear_tf': [True , False],
'classifier__base_estimator':[RF],
'classifier__algorithm': ("SAMME.R","SAMME"),
'classifier__n_estimators':[4,7,11,13,16,19,22,25,28,31,34,43,50]
}
I tried with the GridSearchCV, I added the RF classifier into the AdaBoost parameters.if I use it would the accuracy increase?
No wonder you have not actually seen anyone doing it - it is an absurd and bad idea.
You are trying to build an ensemble (Adaboost) which in itself consists of ensemble base classifiers (RFs) - essentially an "ensemble-squared"; so, no wonder about the high computation time.
But even if it was practical, there are good theoretical reasons not to do it; quoting from my own answer in Execution time of AdaBoost with SVM base classifier:
The argument holds for RFs, too - they are not unstable classifiers, hence there is not any reason to actually expect performance improvements when using them as base classifiers for boosting algorithms, like Adaboost.
这篇关于使用随机森林作为 adaboost 的基本分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!