machine-learning - 从mlr包的重采样函数中获取特定的随机森林变量重要性评估

我正在使用mlr包的resample()函数对随机森林模型进行4000次二次采样（下面的代码段）。

如您所见，要在resample()中创建随机森林模型，我正在使用randomForest包。

我想获得每个子样本迭代的随机森林模型的重要性结果（所有类的准确性均值下降）。我现在可以得到的重要性指标是基尼系数的平均下降。

从mlr的源代码中可以看到，makeRLearner.classif.randomForest中的getFeatureImportanceLearner.classif.randomForest()函数（第69行）使用randomForest::importance()函数（第83行）从randomForest class的所得对象中获取重要性值。但是从源代码（第73行）可以看到，它使用2L作为默认值。我希望它使用1L（第75行）作为值（平均精度降低）。

如何将2L的值传递给resample()函数（下面的代码中的“ extract = getFeatureImportance”行），以便getFeatureImportanceLearner.classif.randomForest()函数获取该值并设置ctrl$type = 2L（第73行）？

rf_task <- makeClassifTask(id = 'task',
                           data = data[, -1], target = 'target_var',
                           positive = 'positive_var')

rf_learner <- makeLearner('classif.randomForest', id = 'random forest',
                          par.vals = list(ntree = 1000, importance = TRUE),
                          predict.type = 'prob')

base_subsample_instance <- makeResampleInstance(rf_boot_desc, rf_task)

rf_subsample_result <- resample(rf_learner, rf_task,
                                base_subsample_instance,
                                extract = getFeatureImportance,
                                measures = list(acc, auc, tpr, tnr,
                                                ppv, npv, f1, brier))

我的解决方案：mlr软件包的下载源代码。将源文件行73更改为1L（https://github.com/mlr-org/mlr/blob/v2.15.0/R/RLearner_classif_randomForest.R）。从命令行安装了软件包并使用了它。不是最佳解决方案，而是解决方案。

最佳答案

您提供了许多与您的问题实际上无关的细节，至少我是如何理解的。
所以我写了一个简单的MWE，其中包含答案。
想法是您必须为getFeatureImportance写一个简短的包装，以便可以传递自己的参数。 purrr的粉丝可以使用purrr::partial(getFeatureImportance, type = 2)做到这一点，但是在这里我手动编写了myExtractor。

library(mlr)
rf_learner <- makeLearner('classif.randomForest', id = 'random forest',
                          par.vals = list(ntree = 100, importance = TRUE),
                          predict.type = 'prob')

measures = list(acc, auc, tpr, tnr,
                ppv, npv, f1, brier)

myExtractor = function(.model, ...) {
  getFeatureImportance(.model, type = 2, ...)
}

res = resample(rf_learner, sonar.task, cv10,
               measures = measures, extract = myExtractor)

# first feature importance result:
res$extract[[1]]

# all values in a matrix:
sapply(res$extract, function(x) x$res)

如果您想进行引导，也许您还应该看看makeBaggingWrapper而不是通过resample解决此问题。

关于machine-learning - 从mlr包的重采样函数中获取特定的随机森林变量重要性评估，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/58923130/