本文介绍了R:tuneRF函数的行为不清楚(randomForest包)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 tuneRF 函数,用于调整mtry参数="noreferrer"> randomForest 函数.

I feel uncomfortable with the meaning of the stepFactor parameter of the tuneRF function which is used for tuning the mtry parameter used further in the randomForest function.

tuneRF的文档说stepFactor是一个大小所选的mtry放气或膨胀.显然,由于mtry是随机选择的多个变量,因此它必须是整数,但是我在网上看到了许多使用stepFactor=1.5的示例.起初,我认为R默认使用下一个mtry等于floor(mtry_current-stepFactor),但事实证明我错了.而且,我不理解在tuneRF工作时显示search left... search right...的R命令.我以为这是有关mtry参数膨胀或缩小的信息,但我的推测并没有正确.

The documentation of tuneRF says that stepFactor is a magnitude by which the chosen mtry gets deflated or inflated.Obviously, since mtry is a number of variables chosen randomly, it has to be an integer, however I saw many examples on the net using stepFactor=1.5.At first I thought that R uses by default next mtry equal to floor(mtry_current-stepFactor), but it turned out that I was wrong.Moreover, I do not understand the R commands displaying search left... search right... while tuneRF is working.I thought it was the information on either inflating or deflating the mtry parameter but my suppositions did not turn out to be correct.

总结一下我对此疑惑的冗长而不太优美的描述,我的问题是:为什么stepFactor不是整数?

To sum up this long and not too graceful description of my doubts, my questions are:why is stepFactor NOT an integer number??

如何选择后续的mtry值?左右搜索实际上意味着什么?

How are subsequent mtry values chosen?What searching left/right actually mean??

任何帮助将不胜感激!! :)

Any help would be very much appreciated!! :)

推荐答案

以下是tuneRF工作原理的摘要:

Below is a summary of how tuneRF works:

  1. a.将 mtry 设置为sqrt( p )的默认值进行分类,并设置 p /3进行回归(其中 p > =变量总数)

  1. a. Set mtry to the default value of sqrt(p) for classification, and p/3 for regression (where p = total number of variables)

b.为 mtry 设置为

a.向左看:设置 mtry =默认值/stepFactor.例如,如果stepFactor = 1.5且您的默认起始值​​为8,则 mtry 将设置为8/1.5 = 5.33,四舍五入为一个整数,得到6

a. Look to the left: set mtry = default value/stepFactor. For instance, if stepFactor=1.5 and your default starting value is 8, mtry would be set to be 8/1.5=5.33, rounded up to the be an integer, which gives 6

b.计算OOB错误,例如error_left

b. Compute the OOB error, say error_left

a.向右看:设置 mtry =默认值* stepFactor.继续我的示例,将 mtry 设置为8 * 1.5 = 12

a. Look to the right: set mtry = default value*stepFactor. To continue with my example, mtry would be set to be 8*1.5=12

b.计算OOB错误,说error_right

b. Compute the OOB error, say error_right

i.如果(error_default< error_right)或(error_default< error_left),最好的 mtry 是默认值

i. If (error_default < error_right) OR (error_default < error_left), the best mtry is the default value

ii.如果不满足先前的条件,但是errors_default和error_right/error_left之间的差小于 improve 参数,则最佳 mtry 是默认值

ii. If the previous condition is not met, but the delta between errors_default and error_right/error_left is less than the improve parameter, the best mtry is the default value

iii.在不失去一般性的前提下,如果不满足条件,并且error_right< error_left,如果(error_default-error_right)> 改进,请将 mtry 设置为mtry_right(12).从现在开始,请始终移至右侧

iii. Without any loss of generality, if the condition is not met, and if error_right < error_left, and if (error_default-error_right) > improve, set mtry to be mtry_right (12). From now on, always go to the right

如果是4.iii.经过验证,重复:将 mtry 设置为mtry_right * stepFactor(在我的示例中为12 * 1.5 = 18),计算OOB错误并将其与上一步获得的错误进行比较(在我的示例中) ,对于 mtry = 12).如果错误的新错误较小,并且减少错误的增益足够(即,> 改进),请选择新的 mtry 并继续重复这些步骤,否则停止并返回当前的 mtry 作为最佳的 mtry

If 4.iii. is verified, iterate: set mtry to be mtry_right*stepFactor (in my example, 12*1.5=18), compute the OOB error and compare it with the error obtained at the previous step (in my example, for mtry=12). If the error new error is smaller, and if the gain in error reduction is enough (i.e, >improve), select the new mtry and continue to repeat these steps, otherwise stop and return the current mtry as the best mtry

您设置的stepFactor越小(例如1.1、1.2),您尝试的 mtry 值(精细搜索)越多,您设置的stepFactor越大(例如2、2.5),则越少您尝试的值(粗略搜索).此外,如果 promise 的值较低,则搜索将持续更长的时间.

The smaller stepFactor you set (e.g., 1.1, 1.2), the more values of mtry you try (fine search), the bigger stepFactor you set (e.g., 2, 2.5), the less values you try (rough search). Also, with low values of improve, the search will continue longer.

这篇关于R:tuneRF函数的行为不清楚(randomForest包)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 13:50