关于“如何从统计学习的元素中绘制k近邻分类器的决策边界?"的变体?

为完整起见，这是该链接的原始示例:library(ElemStatLearn)require(class)x <- mixture.example$xg <- mixture.example$yxnew <- mixture.example$xnewmod15 <- knn(x, xnew, g, k=15, prob=TRUE)prob <- attr(mod15, "prob")prob <- ifelse(mod15=="1", prob, 1-prob)px1 <- mixture.example$px1px2 <- mixture.example$px2prob15 <- matrix(prob, length(px1), length(px2))par(mar=rep(2,4))contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main= "15-nearest neighbour", axes=FALSE)points(x, col=ifelse(g==1, "coral", "cornflowerblue"))gd <- expand.grid(x=px1, y=px2)points(gd, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue"))box()我一直在研究该示例，并希望尝试使其与三个类一起使用.我可以用类似的方式更改g的一些值g[8:16] <- 2只是假装有些样品来自第三类.不过，我无法使该情节起作用.我想我需要更改赢得类票比例的线:prob <- attr(mod15, "prob")prob <- ifelse(mod15=="1", prob, 1-prob)以及轮廓上的水平:contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main="15-nearest neighbour", axes=FALSE)我也不确定轮廓是否是正确的工具.一种可行的替代方法是创建一个覆盖我感兴趣的区域的数据矩阵，对该矩阵的每个点进行分类，并用大的标记和不同的颜色绘制这些点，类似于对这些点所做的操作(gd .. .)位.最终目的是能够显示由不同分类器生成的不同决策边界.有人可以指出我正确的方向吗?谢谢拉斐尔解决方案分离代码中的主要部分将有助于概述如何实现此目标: 3类测试数据 train <- rbind(iris3[1:25,1:2,1], iris3[1:25,1:2,2], iris3[1:25,1:2,3]) cl <- factor(c(rep("s",25), rep("c",25), rep("v",25))) 覆盖网格的测试数据 require(MASS) test <- expand.grid(x=seq(min(train[,1]-1), max(train[,1]+1), by=0.1), y=seq(min(train[,2]-1), max(train[,2]+1), by=0.1)) 该网格的分类显然有3个课程 require(class) classif <- knn(train, test, cl, k = 3, prob=TRUE) prob <- attr(classif, "prob") 用于绘制的数据结构 require(dplyr) dataf <- bind_rows(mutate(test, prob=prob, cls="c", prob_cls=ifelse(classif==cls, 1, 0)), mutate(test, prob=prob, cls="v", prob_cls=ifelse(classif==cls, 1, 0)), mutate(test, prob=prob, cls="s", prob_cls=ifelse(classif==cls, 1, 0))) 情节 require(ggplot2) ggplot(dataf) + geom_point(aes(x=x, y=y, col=cls), data = mutate(test, cls=classif), size=1.2) + geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls), bins=2, data=dataf) + geom_point(aes(x=x, y=y, col=cls), size=3, data=data.frame(x=train[,1], y=train[,2], cls=cl)) 我们也可以稍微想一点点，并绘制类成员资格的概率以表示信心". ggplot(dataf) + geom_point(aes(x=x, y=y, col=cls, size=prob), data = mutate(test, cls=classif)) + scale_size(range=c(0.8, 2)) + geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls), bins=2, data=dataf) + geom_point(aes(x=x, y=y, col=cls), size=3, data=data.frame(x=train[,1], y=train[,2], cls=cl)) + geom_point(aes(x=x, y=y), size=3, shape=1, data=data.frame(x=train[,1], y=train[,2], cls=cl)) This is a question related to https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-oFor completeness, here's the original example from that link:library(ElemStatLearn)require(class)x <- mixture.example$xg <- mixture.example$yxnew <- mixture.example$xnewmod15 <- knn(x, xnew, g, k=15, prob=TRUE)prob <- attr(mod15, "prob")prob <- ifelse(mod15=="1", prob, 1-prob)px1 <- mixture.example$px1px2 <- mixture.example$px2prob15 <- matrix(prob, length(px1), length(px2))par(mar=rep(2,4))contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main= "15-nearest neighbour", axes=FALSE)points(x, col=ifelse(g==1, "coral", "cornflowerblue"))gd <- expand.grid(x=px1, y=px2)points(gd, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue"))box()I've been playing with that example, and would like to try to make it work with three classes. I can change some values of g with something likeg[8:16] <- 2just to pretend that there are some samples which are from a third class. I can't make the plot work, though. I guess I need to change the lines that deal with the proportion of votes for winning class:prob <- attr(mod15, "prob")prob <- ifelse(mod15=="1", prob, 1-prob)and also the levels on the contour:contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main="15-nearest neighbour", axes=FALSE)I am also not sure contour is the right tool for this. One alternative that works is to create a matrix of data that covers the region I'm interested, classify each point of this matrix and plot those with a large marker and different colors, similar to what is being done with the points(gd...) bit.The final purpose is to be able to show different decision boundaries generated by different classifiers. Can someone point me to the right direction?thanks Rafael 解决方案 Separating the main parts in the code will help outlining how to achieve this:Test data with 3 classes train <- rbind(iris3[1:25,1:2,1], iris3[1:25,1:2,2], iris3[1:25,1:2,3]) cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))Test data covering a grid require(MASS) test <- expand.grid(x=seq(min(train[,1]-1), max(train[,1]+1), by=0.1), y=seq(min(train[,2]-1), max(train[,2]+1), by=0.1))Classification for that grid3 classes obviously require(class) classif <- knn(train, test, cl, k = 3, prob=TRUE) prob <- attr(classif, "prob")Data structure for plotting require(dplyr) dataf <- bind_rows(mutate(test, prob=prob, cls="c", prob_cls=ifelse(classif==cls, 1, 0)), mutate(test, prob=prob, cls="v", prob_cls=ifelse(classif==cls, 1, 0)), mutate(test, prob=prob, cls="s", prob_cls=ifelse(classif==cls, 1, 0)))Plot require(ggplot2) ggplot(dataf) + geom_point(aes(x=x, y=y, col=cls), data = mutate(test, cls=classif), size=1.2) + geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls), bins=2, data=dataf) + geom_point(aes(x=x, y=y, col=cls), size=3, data=data.frame(x=train[,1], y=train[,2], cls=cl))We can also be a little fancier and plot the probability of class membership as a indication of the "confidence". ggplot(dataf) + geom_point(aes(x=x, y=y, col=cls, size=prob), data = mutate(test, cls=classif)) + scale_size(range=c(0.8, 2)) + geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls), bins=2, data=dataf) + geom_point(aes(x=x, y=y, col=cls), size=3, data=data.frame(x=train[,1], y=train[,2], cls=cl)) + geom_point(aes(x=x, y=y), size=3, shape=1, data=data.frame(x=train[,1], y=train[,2], cls=cl)) 这篇关于关于“如何从统计学习的元素中绘制k近邻分类器的决策边界?"的变体?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！