将决策边界拟合为R中的逻辑回归模型

本文介绍了将决策边界拟合为R中的逻辑回归模型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我有2个变量（考试分数）和一个二进制分类，无论学生是否被录取学校与否。数据如下所示： > head（exam.data） Exam1Score Exam2Score Admitted 1 34.62366 78.02469 0 2 30.28671 43.89500 0 3 35.84741 72.90220 0 4 60.18260 86.30855 1 5 79.03274 75.34438 1 6 45.08328 56.31637 0 我可以使用ggplot绘制数据： exam.plot< - ggplot（data = exam.data，aes（x = Exam1Score，y = Exam2Score，col = ifelse（Admitted == 1，'dark green'，'red'），size = 0.5））+ geom_point（） + 实验室（x =考试1分，y =考试2分，标题=考试分数，颜色=考试分数）+ theme_bw（）+ 主题（legend.position =none）然后成功地拟合逻辑回归模型： p> exam.lm 所以在muc在搜索网页时，我决定手动适合决策边界（虽然尝试了一段时间，但使用stat_smooth却无法使其工作），我尝试了以下方法：＃适合决策边界 plot_x plot_y colnames（db.data）< - c（'x'，'y'）＃Add决定边界图 ggplot（）+ geom_line（data = db.data，aes（x = x，y = y））成功绘制了决策边界，但我无法将其添加到现有的绘图中： > exam.plot + geom_line（data = db.data，aes（x = x，y = y））错误：美学必须是长度为1或与dataProblems的长度相同：x，y 有人可以指出我做错了什么，或者我是否可以用+ stat_smooth（）来做到这一点？所有代码（ex2.R）和文件都在这里： https://github.com/StuHorsman/rscripts/tree/master/R/Coursera 谢谢！ > Stuart 更新：我可以实现一些类似的功能： plot（exam.data $ Exam1Score，exam.data $ Exam2Score，type =n，xlab =Exam 1 Scores，ylab =Exam 2 Scores）点（exam.data $ Exam1Score [exam.data $ Admitted == 1]，exam.data $ Exam2Score [exam.data $ Admitted == 1]，pch = 4，col =green） points （exam.data $ Exam1Score [exam.data $ Admitted == 0]，exam.data $ Exam2Score [exam.data $ Admitted == 0]，pch = 4，col =red） lines（db .DAT a，col =blue）解决方案问题是在 exam.plot 中，您不仅使用美学 x 和 y ，但也是 col 和 size （后者不必要）。这些图层需要在 ggplot（）调用中定义的全部美学设置。（我经常被这个问题困住）。因此： exam.plot + geom_line（data = db.data，aes（x = x，y = y），col =black，size = 1）确实有阴谋。不过，我建议改变 exam.plot 一点，并删除所有不适用于所有的美学（并将它们放入图层定义中）： exam.plot< - ggplot（data = exam.data， aes（x = Exam1Score，y = Exam2Score））+ geom_point（aes（col = Admitted），size = 0.5）+ scale_color_manual（values = c（'red'，'dark green'）） + 实验室（x =考试1分，y =考试2分，标题=考试分数，颜色=考试分数）+ theme_bw（）+ coord_equal（）+＃假设分数具有相同的比例。主题（legend.position =none） exam.plot + geom_line（data = db.data，aes（x = x，y = y））其中包含示例数据 Exam2Score = rnorm（100）+ 0：1， Admitted = factor（rep（0： 1,50）））得出：（以默认大小绘制，0.5对此很难看出例如） I'm struggling to plot a decision boundary in R using ggplot.I have 2 variables (exam scores) and a binary classification whether a student was admitted to school or not. The data looks like below:> head(exam.data) Exam1Score Exam2Score Admitted1 34.62366 78.02469 02 30.28671 43.89500 03 35.84741 72.90220 04 60.18260 86.30855 15 79.03274 75.34438 16 45.08328 56.31637 0I can plot the data using ggplot:exam.plot <- ggplot(data=exam.data, aes(x=Exam1Score, y=Exam2Score, col = ifelse(Admitted == 1,'dark green','red'), size=0.5))+ geom_point()+ labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+ theme_bw()+ theme(legend.position="none")and then successfully fit the logistic regression model:exam.lm <- glm(data=exam.data, formula=Admitted ~ Exam1Score + Exam2Score, family="binomial")So after much searching the web, I decided to manually fit the decision boundary (though did try for a while doing this using stat_smooth but couldn't get it to work), I tried the following:# Fit the decision boundaryplot_x <- c(min(exam.data$Exam1Score)-2, max(exam.data$Exam1Score)+2)plot_y <- (-1 /coef(exam.lm)[3]) * (coef(exam.lm)[2] * plot_x + coef(exam.lm)[1])db.data <- data.frame(rbind(plot_x, plot_y))colnames(db.data) <- c('x','y')# Add the decision boundary plotggplot()+geom_line(data=db.data, aes(x=x, y=y))which successfully plots the decision boundary, but I can't add it to my existing plot with:> exam.plot+geom_line(data=db.data, aes(x=x, y=y))Error: Aesthetics must either be length one, or the same length as the dataProblems:x, yCan someone point out what I'm doing wrong or whether I can actually do this with +stat_smooth()?All code (ex2.R) and files are here: https://github.com/StuHorsman/rscripts/tree/master/R/CourseraThanks!StuartUpdate: I can achieve some similar with:plot(exam.data$Exam1Score, exam.data$Exam2Score, type="n", xlab="Exam 1 Scores", ylab="Exam 2 Scores")points(exam.data$Exam1Score[exam.data$Admitted==1], exam.data$Exam2Score[exam.data$Admitted==1], pch=4, col="green")points(exam.data$Exam1Score[exam.data$Admitted==0], exam.data$Exam2Score[exam.data$Admitted==0], pch=4, col="red")lines(db.data, col="blue") 解决方案 The problem is that in exam.plot you use not only aesthetics x and y, but also col and size (the latter unnecesarily). The layers need to have all aesthetics set that are defined in the ggplot () call. (I've been caught often by that problem).Thus:exam.plot+geom_line(data=db.data, aes(x=x, y=y), col = "black", size = 1)does plot.However, I'd recommend changing exam.plot a bit and removing all aesthetics that do not apply for all layers (and put them into the layer definition instead): exam.plot <- ggplot(data=exam.data, aes(x = Exam1Score, y=Exam2Score))+ geom_point(aes (col = Admitted), size = 0.5)+ scale_color_manual (values = c('red', 'dark green')) + labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+ theme_bw()+ coord_equal () + # assuming that the scores have the same scale. theme(legend.position="none")exam.plot + geom_line(data=db.data, aes(x=x, y=y))Which with example dataexam.data <- data.frame (Exam1Score = rnorm (100) + 0:1, Exam2Score = rnorm (100) + 0:1, Admitted = factor (rep (0:1, 50)))yields:(plotted with default size, 0.5 would hardly be visible for this example) 这篇关于将决策边界拟合为R中的逻辑回归模型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！