问题描述
有人知道为什么下面的KNN R代码对不同的种子给出不同的预测吗?当K <-5时,这是很奇怪的,因此,大多数是很好定义的.另外,浮点数不算小,不属于数据精度问题.(备注:我知道测试与训练有异样.这只是一个合成的示例,用于演示奇怪的KNN行为)
Does anyone know why the following KNN R code gives different predictions for different seeds? This is strange as K<-5, and thus the majority is well defined. In addition, the floating numbers are not that small to fall under a precision of data problem.(remark: I know the test is weirdly different from the training. This is only a synthetic example created to demonstrate the strange KNN behavior)
library(class)
train <- rbind(
c(0.0626015, 0.0530052, 0.0530052, 0.0496676, 0.0530052, 0.0626015),
c(0.0565861, 0.0569546, 0.0569546, 0.0511377, 0.0569546, 0.0565861),
c(0.0538332, 0.057786, 0.057786, 0.0506127, 0.057786, 0.0538332),
c(0.059033, 0.0541484, 0.0541484, 0.0501926, 0.0541484, 0.059033),
c(0.0587272, 0.0540445, 0.0540445, 0.0505076, 0.0540445, 0.0587272),
c(0.0578095, 0.0564349, 0.0564349, 0.0505076, 0.0564349, 0.0578095)
)
trainLabels <- c(1,
1,
0,
0,
1,
0)
test <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241)
K <- 5
set.seed(494139)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# **predicted: 1**, seed: 494139
set.seed(5371)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# **predicted: 0**, seed: 5371
推荐答案
knn
函数调用基础 C函数(第122行),称为VR_knn
,其中包括引入模糊"或较小值(epsilon,EPS)的步骤.看起来您的示例参数值可能与模糊"步骤相抵触.有证据表明,将值四舍五入会产生一致性.因此:
The knn
function calls an underlying C function (line 122) called VR_knn
, which includes a step that introduces "fuzz" or a small value (epsilon, EPS). Looks like your example parameter values may be hitting up against that "fuzz" step. Evidence for this is the fact that rounding your values to 4 digits yields consistency. As such:
library(class)
train <- rbind(
c(0.0626015, 0.0530052, 0.0530052, 0.0496676, 0.0530052, 0.0626015),
c(0.0565861, 0.0569546, 0.0569546, 0.0511377, 0.0569546, 0.0565861),
c(0.0538332, 0.057786, 0.057786, 0.0506127, 0.057786, 0.0538332),
c(0.059033, 0.0541484, 0.0541484, 0.0501926, 0.0541484, 0.059033),
c(0.0587272, 0.0540445, 0.0540445, 0.0505076, 0.0540445, 0.0587272),
c(0.0578095, 0.0564349, 0.0564349, 0.0505076, 0.0564349, 0.0578095)
)
trainLabels <- c(1,1,0,0,1,0)
test <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241)
K <- 5
train <- round(train,4)
seed <- 494139
set.seed(seed)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# predicted: 0, seed: 494139
seed <- 5371
set.seed(seed)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# predicted: 0, seed: 5371
这篇关于问:R中的KNN-奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!