使用随机森林作为 adaboost 的基本分类器

本文介绍了使用随机森林作为 adaboost 的基本分类器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我可以将 AdaBoost 与随机森林一起用作基本分类器吗?我在互联网上搜索，我没有找到任何人这样做.

就像下面的代码；我尝试运行它，但需要很多时间:

estimators = Pipeline([('vectorizer', CountVectorizer()),('变压器', TfidfTransformer()),('分类器', AdaBoostClassifier(learning_rate=1))])RF=RandomForestClassifier(criterion='entropy',n_estimators=100,max_depth=500,min_samples_split=100,max_leaf_nodes=None,max_features='log2')参数网格={'vectorizer__ngram_range': [(1,2),(1,3)],'vectorizer__min_df':[5]，'vectorizer__max_df':[0.7]，'vectorizer__max_features':[1500]，'transformer__use_idf':[真，假]，'transformer__norm': ('l1','l2'),'transformer__smooth_idf':[真，假]，'transformer__sublinear_tf':[真，假]，'classifier__base_estimator':[RF],'分类器__算法':(SAMME.R"，SAMME")，'classifier__n_estimators':[4,7,11,13,16,19,22,25,28,31,34,43,50]}

我尝试使用 GridSearchCV，将 RF 分类器添加到 AdaBoost 参数中.如果我使用它会提高准确性吗?

解决方案

难怪你实际上没有看到有人这样做 - 这是一个荒谬和糟糕的主意.

您正在尝试构建一个集成 (Adaboost)，它本身由集成基分类器 (RF) 组成——本质上是一个集成平方"；所以，难怪计算时间长.

但即使它是实用的，也有很好的理论上理由不这样做；引用我自己在 AdaBoost 与 SVM 的执行时间中的回答基本分类器:

Adaboost(和类似的集成方法)是使用决策树作为基本分类器(更具体地说，决策树桩，即深度仅为 1 的 DT)构思的；今天仍然有充分的理由，如果您没有明确指定 base_classifier 参数，它会假定 DecisionTreeClassifier(max_depth=1) 的值.DTs 适用于这种集成，因为它们本质上是不稳定分类器，而 SVM 并非如此，因此当用作基分类器时，预计后者不会提供太多.

最重要的是，SVM 在计算上比决策树(更不用说决策树桩)要昂贵得多，这就是您观察到的处理时间长的原因.

这个论点也适用于 RFs - 它们不是不稳定分类器，因此在将它们用作增强算法(如 Adaboost)的基本分类器时，没有任何理由实际期望性能改进.>

Can I use AdaBoost with random forest as a base classifier? I searched on the internet and I didn't find anyone who does it.

Like in the following code; I try to run it but it takes a lot of time:

estimators = Pipeline([('vectorizer', CountVectorizer()),
                       ('transformer', TfidfTransformer()),
                       ('classifier', AdaBoostClassifier(learning_rate=1))])

RF=RandomForestClassifier(criterion='entropy',n_estimators=100,max_depth=500,min_samples_split=100,max_leaf_nodes=None,
                          max_features='log2')


param_grid={
    'vectorizer__ngram_range': [(1,2),(1,3)],
    'vectorizer__min_df': [5],
    'vectorizer__max_df': [0.7],
    'vectorizer__max_features': [1500],

    'transformer__use_idf': [True , False],
    'transformer__norm': ('l1','l2'),
    'transformer__smooth_idf': [True , False],
     'transformer__sublinear_tf': [True , False],

    'classifier__base_estimator':[RF],
    'classifier__algorithm': ("SAMME.R","SAMME"),
    'classifier__n_estimators':[4,7,11,13,16,19,22,25,28,31,34,43,50]
}

I tried with the GridSearchCV, I added the RF classifier into the AdaBoost parameters.if I use it would the accuracy increase?

解决方案

No wonder you have not actually seen anyone doing it - it is an absurd and bad idea.

You are trying to build an ensemble (Adaboost) which in itself consists of ensemble base classifiers (RFs) - essentially an "ensemble-squared"; so, no wonder about the high computation time.

But even if it was practical, there are good theoretical reasons not to do it; quoting from my own answer in Execution time of AdaBoost with SVM base classifier:

The argument holds for RFs, too - they are not unstable classifiers, hence there is not any reason to actually expect performance improvements when using them as base classifiers for boosting algorithms, like Adaboost.

这篇关于使用随机森林作为 adaboost 的基本分类器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！