我正在研究Coursera机器学习项目。目标是对以下数据集执行预测建模。

> summary(training)
   roll_belt        pitch_belt          yaw_belt       total_accel_belt  gyros_belt_x
 Min.   :-28.90   Min.   :-55.8000   Min.   :-180.00   Min.   : 0.00    Min.   :-1.040000
 1st Qu.:  1.10   1st Qu.:  1.7600   1st Qu.: -88.30   1st Qu.: 3.00    1st Qu.:-0.030000
 Median :113.00   Median :  5.2800   Median : -13.00   Median :17.00    Median : 0.030000
 Mean   : 64.41   Mean   :  0.3053   Mean   : -11.21   Mean   :11.31    Mean   :-0.005592
 3rd Qu.:123.00   3rd Qu.: 14.9000   3rd Qu.:  12.90   3rd Qu.:18.00    3rd Qu.: 0.110000
 Max.   :162.00   Max.   : 60.3000   Max.   : 179.00   Max.   :29.00    Max.   : 2.220000
  gyros_belt_y       gyros_belt_z      accel_belt_x       accel_belt_y     accel_belt_z     magnet_belt_x
 Min.   :-0.64000   Min.   :-1.4600   Min.   :-120.000   Min.   :-69.00   Min.   :-275.00   Min.   :-52.0
 1st Qu.: 0.00000   1st Qu.:-0.2000   1st Qu.: -21.000   1st Qu.:  3.00   1st Qu.:-162.00   1st Qu.:  9.0
 Median : 0.02000   Median :-0.1000   Median : -15.000   Median : 35.00   Median :-152.00   Median : 35.0
 Mean   : 0.03959   Mean   :-0.1305   Mean   :  -5.595   Mean   : 30.15   Mean   : -72.59   Mean   : 55.6
 3rd Qu.: 0.11000   3rd Qu.:-0.0200   3rd Qu.:  -5.000   3rd Qu.: 61.00   3rd Qu.:  27.00   3rd Qu.: 59.0
 Max.   : 0.64000   Max.   : 1.6200   Max.   :  85.000   Max.   :164.00   Max.   : 105.00   Max.   :485.0
 magnet_belt_y   magnet_belt_z       roll_arm         pitch_arm          yaw_arm          total_accel_arm
 Min.   :354.0   Min.   :-623.0   Min.   :-180.00   Min.   :-88.800   Min.   :-180.0000   Min.   : 1.00
 1st Qu.:581.0   1st Qu.:-375.0   1st Qu.: -31.77   1st Qu.:-25.900   1st Qu.: -43.1000   1st Qu.:17.00
 Median :601.0   Median :-320.0   Median :   0.00   Median :  0.000   Median :   0.0000   Median :27.00
 Mean   :593.7   Mean   :-345.5   Mean   :  17.83   Mean   : -4.612   Mean   :  -0.6188   Mean   :25.51
 3rd Qu.:610.0   3rd Qu.:-306.0   3rd Qu.:  77.30   3rd Qu.: 11.200   3rd Qu.:  45.8750   3rd Qu.:33.00
 Max.   :673.0   Max.   : 293.0   Max.   : 180.00   Max.   : 88.500   Max.   : 180.0000   Max.   :66.00
  gyros_arm_x        gyros_arm_y       gyros_arm_z       accel_arm_x       accel_arm_y
 Min.   :-6.37000   Min.   :-3.4400   Min.   :-2.3300   Min.   :-404.00   Min.   :-318.0
 1st Qu.:-1.33000   1st Qu.:-0.8000   1st Qu.:-0.0700   1st Qu.:-242.00   1st Qu.: -54.0
 Median : 0.08000   Median :-0.2400   Median : 0.2300   Median : -44.00   Median :  14.0
 Mean   : 0.04277   Mean   :-0.2571   Mean   : 0.2695   Mean   : -60.24   Mean   :  32.6
 3rd Qu.: 1.57000   3rd Qu.: 0.1400   3rd Qu.: 0.7200   3rd Qu.:  84.00   3rd Qu.: 139.0
 Max.   : 4.87000   Max.   : 2.8400   Max.   : 3.0200   Max.   : 437.00   Max.   : 308.0
  accel_arm_z       magnet_arm_x     magnet_arm_y     magnet_arm_z    roll_dumbbell     pitch_dumbbell
 Min.   :-636.00   Min.   :-584.0   Min.   :-392.0   Min.   :-597.0   Min.   :-153.71   Min.   :-149.59
 1st Qu.:-143.00   1st Qu.:-300.0   1st Qu.:  -9.0   1st Qu.: 131.2   1st Qu.: -18.49   1st Qu.: -40.89
 Median : -47.00   Median : 289.0   Median : 202.0   Median : 444.0   Median :  48.17   Median : -20.96
 Mean   : -71.25   Mean   : 191.7   Mean   : 156.6   Mean   : 306.5   Mean   :  23.84   Mean   : -10.78
 3rd Qu.:  23.00   3rd Qu.: 637.0   3rd Qu.: 323.0   3rd Qu.: 545.0   3rd Qu.:  67.61   3rd Qu.:  17.50
 Max.   : 292.00   Max.   : 782.0   Max.   : 583.0   Max.   : 694.0   Max.   : 153.55   Max.   : 149.40
  yaw_dumbbell      total_accel_dumbbell gyros_dumbbell_x    gyros_dumbbell_y   gyros_dumbbell_z
 Min.   :-150.871   Min.   : 0.00        Min.   :-204.0000   Min.   :-2.10000   Min.   : -2.380
 1st Qu.: -77.644   1st Qu.: 4.00        1st Qu.:  -0.0300   1st Qu.:-0.14000   1st Qu.: -0.310
 Median :  -3.324   Median :10.00        Median :   0.1300   Median : 0.03000   Median : -0.130
 Mean   :   1.674   Mean   :13.72        Mean   :   0.1611   Mean   : 0.04606   Mean   : -0.129
 3rd Qu.:  79.643   3rd Qu.:19.00        3rd Qu.:   0.3500   3rd Qu.: 0.21000   3rd Qu.:  0.030
 Max.   : 154.952   Max.   :58.00        Max.   :   2.2200   Max.   :52.00000   Max.   :317.000
 accel_dumbbell_x  accel_dumbbell_y  accel_dumbbell_z  magnet_dumbbell_x magnet_dumbbell_y
 Min.   :-419.00   Min.   :-189.00   Min.   :-334.00   Min.   :-643.0    Min.   :-3600
 1st Qu.: -50.00   1st Qu.:  -8.00   1st Qu.:-142.00   1st Qu.:-535.0    1st Qu.:  231
 Median :  -8.00   Median :  41.50   Median :  -1.00   Median :-479.0    Median :  311
 Mean   : -28.62   Mean   :  52.63   Mean   : -38.32   Mean   :-328.5    Mean   :  221
 3rd Qu.:  11.00   3rd Qu.: 111.00   3rd Qu.:  38.00   3rd Qu.:-304.0    3rd Qu.:  390
 Max.   : 235.00   Max.   : 315.00   Max.   : 318.00   Max.   : 592.0    Max.   :  633
 magnet_dumbbell_z  roll_forearm       pitch_forearm     yaw_forearm      total_accel_forearm
 Min.   :-262.00   Min.   :-180.0000   Min.   :-72.50   Min.   :-180.00   Min.   :  0.00
 1st Qu.: -45.00   1st Qu.:  -0.7375   1st Qu.:  0.00   1st Qu.: -68.60   1st Qu.: 29.00
 Median :  13.00   Median :  21.7000   Median :  9.24   Median :   0.00   Median : 36.00
 Mean   :  46.05   Mean   :  33.8265   Mean   : 10.71   Mean   :  19.21   Mean   : 34.72
 3rd Qu.:  95.00   3rd Qu.: 140.0000   3rd Qu.: 28.40   3rd Qu.: 110.00   3rd Qu.: 41.00
 Max.   : 452.00   Max.   : 180.0000   Max.   : 89.80   Max.   : 180.00   Max.   :108.00
 gyros_forearm_x   gyros_forearm_y     gyros_forearm_z    accel_forearm_x   accel_forearm_y
 Min.   :-22.000   Min.   : -7.02000   Min.   : -8.0900   Min.   :-498.00   Min.   :-632.0
 1st Qu.: -0.220   1st Qu.: -1.46000   1st Qu.: -0.1800   1st Qu.:-178.00   1st Qu.:  57.0
 Median :  0.050   Median :  0.03000   Median :  0.0800   Median : -57.00   Median : 201.0
 Mean   :  0.158   Mean   :  0.07517   Mean   :  0.1512   Mean   : -61.65   Mean   : 163.7
 3rd Qu.:  0.560   3rd Qu.:  1.62000   3rd Qu.:  0.4900   3rd Qu.:  76.00   3rd Qu.: 312.0
 Max.   :  3.970   Max.   :311.00000   Max.   :231.0000   Max.   : 477.00   Max.   : 923.0
 accel_forearm_z   magnet_forearm_x  magnet_forearm_y magnet_forearm_z classe
 Min.   :-446.00   Min.   :-1280.0   Min.   :-896.0   Min.   :-973.0   A:5580
 1st Qu.:-182.00   1st Qu.: -616.0   1st Qu.:   2.0   1st Qu.: 191.0   B:3797
 Median : -39.00   Median : -378.0   Median : 591.0   Median : 511.0   C:3422
 Mean   : -55.29   Mean   : -312.6   Mean   : 380.1   Mean   : 393.6   D:3216
 3rd Qu.:  26.00   3rd Qu.:  -73.0   3rd Qu.: 737.0   3rd Qu.: 653.0   E:3607
 Max.   : 291.00   Max.   :  672.0   Max.   :1480.0   Max.   :1090.0

为了训练模型,我做了以下工作:
trainCtrl <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
rfModel <- train(classe ~., method = "rf", trControl = trainCtrl, preProcess = "pca", data = training, prox = TRUE)

该模型有效。但是,我对多次警告消息invalid mtry: reset to within valid range重复了20次,感到非常恼火。在Google上进行的一些搜索未返回任何有用的见解。另外,不确定是否重要,数据集中没有NA值;他们在先前的步骤中被删除。

我还运行了system.time(),处理时间明显超过1小时。
> system.time(train(classe ~., method = "rf", trControl = trainCtrl, preProcess = "pca", data = training, prox = TRUE))
    user   system  elapsed
6478.113  302.281 7044.483

如果您可以帮助解释该警告消息的内容和原因,那将是 super 好。我很想听听关于这么长的处理时间的任何评论。

谢谢!

最佳答案

caret rf方法使用randomForest包中的randomForest函数。如果将mtryrandomForest参数设置为大于预测变量数量的值,则会收到发布的警告(例如,尝试rf = randomForest(mpg ~ ., mtry=15, data=mtcars))。该模型仍在运行,但是randomForestmtry设置为一个较低的有效值。

问题是,为什么train(或其调用的函数之一)将randomForest的值传递给mtry太大?我不确定,但这只是个猜测:设置preProcess="pca"会减少馈送给randomForest的要素数量(相对于原始数据中的要素数量),因为舍弃了最不重要的主要成分以降低要素的维数放。但是,在进行交叉验证时,train仍然有可能基于原始数据中的大量功能,而不是基于实际馈送给mtry的预处理数据集,为randomForest设置randomForest的最大preProcess="pca"值。这种情况的间接证据是,如果删除了ojit_code参数,警告就会消失,但是我没有做进一步的检查。

可再现的代码显示警告在没有pca的情况下消失了:

trainCtrl <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
rfModel <- train(mpg ~., method = "rf", trControl = trainCtrl, preProcess = "pca", data = mtcars, prox = TRUE)
rfModel <- train(mpg ~., method = "rf", trControl = trainCtrl, data = mtcars, prox = TRUE)

关于r - 插入符方法= "rf"警告消息: invalid ## mtry: reset to within valid rang,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/49186277/

10-12 16:30