问题描述
我正在使用插入符号包,并使用knn算法训练模型,但是遇到了错误.我正在使用德国信用数据,这就是数据框的结构
Hi I am using the caret package and training a model with a knn algorithm but I am running into an error. I am using the german credit data and this is the structure of the data frame
'data.frame': 1000 obs. of 21 variables:
$ checking_balance : Factor w/ 4 levels "< 0 DM","> 200 DM",..: 1 3 4 1 1
$ months_loan_duration: int 6 48 12 42 24 36 24 36 12 30 ...
$ credit_history : Factor w/ 5 levels "critical","delayed",..: 1 5 1 5
$ purpose : Factor w/ 10 levels "business","car (new)",..: 8 8 5
$ amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059
$ savings_balance : Factor w/ 5 levels "< 100 DM","> 1000 DM",..: 5 1
$ employment_length : Factor w/ 5 levels "> 7 yrs","0 - 1 yrs",..: 1 3 4
$ installment_rate : int 4 2 2 2 3 2 3 2 2 4 ...
$ personal_status : Factor w/ 4 levels "divorced male",..: 4 2 4 4 4
$ other_debtors : Factor w/ 3 levels "co-applicant",..: 3 3 3 2 3 3
$ residence_history : int 4 2 3 4 4 4 4 2 4 2 ...
$ property : Factor w/ 4 levels "building society savings",..:
$ age : int 67 22 49 45 53 35 53 35 61
$ installment_plan : Factor w/ 3 levels "bank","none",..: 2 2 2 2 2 2
$ housing : Factor w/ 3 levels "for free","own",..: 2 2 1 2 3
$ existing_credits : int 2 1 1 1 2 1 1 1 ...
$ default : Factor w/ 2 levels "1","2": 1 2 1 1 2 1 1 2 ...
$ dependents : int 1 1 2 2 2 2 1 1 ...
$ telephone : Factor w/ 2 levels "none","yes": 2 1 1 1 2 1 1 .
$ foreign_worker : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 ...
$ job : Factor w/ 4 levels "mangement self-employed",..: 2
目标变量为credit $ default
the target variable is credit$default
运行代码时
cv_opts = trainControl(method="repeatedcv", repeats = 5)
model_knn<-train(trainSet[,predictors],trainSet[,outcomeName],method="knn", trControl=cv_opts)
我收到此错误
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :3 NA's :3
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)
我将相同的代码与其他方法(rpart,ada)一起使用,并且可以正常工作,看来我好像在trControl中缺少knn的某些内容?
I use that same code with other methods, rpart, ada, and it works, it seems I am like I am missing something in the trControl for the knn?
推荐答案
问题在于,当使用插入符号训练功能的默认S3方法时,knn
不知道如何处理分类预测变量:
The problem is the fact knn
does not know how to handle categorical predictors when using the default S3 method of the caret train function:
示例:
library(mlbench)
library(caret)
data(Servo)
summary(Servo)
Motor Screw Pgain Vgain Class
A:36 A:42 3:50 1:47 Min. : 1.00
B:36 B:35 4:66 2:49 1st Qu.:10.50
C:40 C:31 5:26 3:27 Median :18.00
D:22 D:30 6:25 4:22 Mean :21.17
E:33 E:29 5:22 3rd Qu.:33.50
Max. :51.00
所以所有的预测变量都是分类的
so all the predictors are categorical
predictors <- colnames(Servo)[1:4]
cv_opts = trainControl(method="repeatedcv", repeats = 5)
model_knn <- train(Servo[predictors],
Servo[,5],
method = "knn",
trControl = cv_opts)
导致:
Something is wrong; all the RMSE metric values are missing:...
要克服这一点,可以使用公式S3的方法进行训练:
to overcome this one can use the formula S3 method for train:
model_knn <- train(Class~.,
data = Servo,
method = "knn",
trControl = cv_opts)
model_knn
k-Nearest Neighbors
167 samples
4 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 151, 149, 149, 150, 151, 151, ...
Resampling results across tuning parameters:
k RMSE Rsquared MAE
5 9.124929 0.6404554 7.820686
7 9.356812 0.6393563 7.983302
9 9.775620 0.6169618 8.396535
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 5.
或者您可以构建自己的模型矩阵,并在默认的S3方法中使用它:
Or you can build your own model matrix and use it in the default S3 method:
Servo_X <-
model.matrix(Class~.-1,
data = Servo)
model_knn2 <- train(Servo_X,
Servo$Class,
method = "knn",
trControl = cv_opts)
k-Nearest Neighbors
167 samples
16 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 149, 151, 151, 150, 151, 151, ...
Resampling results across tuning parameters:
k RMSE Rsquared MAE
5 9.289972 0.6310129 7.869684
7 9.487649 0.6401052 8.021603
9 9.908227 0.6479472 8.604000
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 5.
另外,在使用knn
时使用preProc = c("center", "scale")
是个好主意,因为您希望所有的预测变量都在同一范围内.
Additionally its a good idea to use preProc = c("center", "scale")
when using knn
since you want all the predictors to be on the same scale.
要了解使用公式界面时发生的情况,请检出:
To understand what is happening when you use the formula interface check out:
https://github.com/topepo/caret/blob/master/models/files/knn.R
以及
caret:::knnreg.formula
caret:::knn3.formula
这篇关于将插入符号包与"knn"一起使用时出错方法-出问题了;所有精度指标值均丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!