我尝试运行此行:
knn(mydades.training[,-7],mydades.test[,-7],mydades.training[,7],k=5)
但我总是得到这个错误:
Error in knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NAs introduced by coercion
2: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NAs introduced by coercion
有什么想法吗?
PS:mydades.training和mydades.test的定义如下:
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
最佳答案
我怀疑您的问题出在“mydades”中具有非数字数据字段。错误行:
NA/NaN/Inf in foreign function call (arg 6)
让我怀疑对C语言实现的knn函数调用失败。 R中的许多功能实际上调用了更高效的基础C实现,而不是仅在R中实现算法。如果在R控制台中仅键入“knn”,则可以检查“knn”的R实现。存在以下行:
Z <- .C(VR_knn, as.integer(k), as.integer(l), as.integer(ntr),
as.integer(nte), as.integer(p), as.double(train), as.integer(unclass(clf)),
as.double(test), res = integer(nte), pr = double(nte),
integer(nc + 1), as.integer(nc), as.integer(FALSE), as.integer(use.all))
.C表示我们正在使用提供的函数参数调用名为“VR_knn”的C函数。由于您有两个错误
NAs introduced by coercion
我认为两个as.double/as.integer调用失败,并引入了NA值。如果我们开始计算参数,则第六个参数是:
as.double(train)
在以下情况下可能会失败:
# as.double can not translate text fields to doubles, they are coerced to NA-values:
> as.double("sometext")
[1] NA
Warning message:
NAs introduced by coercion
# while the following text is cast to double without an error:
> as.double("1.23")
[1] 1.23
您会遇到两个强制错误,可能是由“as.double(train)”和“as.double(test)”给出的。由于您没有为我们提供有关“mydades”的确切详细信息,因此以下是我的一些最佳猜测(以及人工的多元正态分布数据):
library(MASS)
mydades <- mvrnorm(100, mu=c(1:6), Sigma=matrix(1:36, ncol=6))
mydades <- cbind(mydades, sample(LETTERS[1:5], 100, replace=TRUE))
# This breaks knn
mydades[3,4] <- Inf
# This breaks knn
mydades[4,3] <- -Inf
# These, however, do not introduce the coercion for NA-values error message
# This breaks knn and gives the same error; just some raw text
mydades[1,2] <- mydades[50,1] <- "foo"
mydades[100,3] <- "bar"
# ... or perhaps wrongly formatted exponential numbers?
mydades[1,1] <- "2.34EXP-05"
# ... or wrong decimal symbol?
mydades[3,3] <- "1,23"
# should be 1.23, as R uses '.' as decimal symbol and not ','
# ... or most likely a whole column is non-numeric, since the error is given twice (as.double problem both in training AND test set)
mydades[,1] <- sample(letters[1:5],100,replace=TRUE)
我不会将数字数据和类标签都放在一个矩阵中,也许您可以将数据拆分为:
mydadesnumeric <- mydades[,1:6] # 6 first columns
mydadesclasses <- mydades[,7]
使用通话
str(mydades); summary(mydades)
还可帮助您/我们找到有问题的数据条目并将其更正为数字条目或忽略非数字字段。
您提供的其余运行代码(在破坏数据之后):
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
# 7th column seems to be the class labels
knn(train=mydades.training[,-7],test=mydades.test[,-7],mydades.training[,7],k=5)
关于r - knn函数出错,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/16874038/