分类树和朴素贝叶斯曲线

分类树和朴素贝叶斯曲线

本文介绍了在单个ROC图上绘制线性判别分析,分类树和朴素贝叶斯曲线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据位于页面的最底部,称为LDA.scores。这是分类任务,我在数据集上执行了三种监督的机器学习分类技术。提供所有编码,以显示这些ROC曲线的产生方式。对于提出的问题,我深表歉意,但是近两个星期以来,我一直在尝试使用不同的代码组合来解决这些问题,因此,如果有人可以帮助我,那么谢谢您。主要问题是朴素贝叶斯曲线的完美分数为1,这显然是错误的,我无法解决如何将线性判别分析曲线合并到单个ROC图中以与提供的编码进行比较的问题。

The data is present at the very bottom of the page and is called LDA.scores'. This is a classification task where I performed three supervised machine learning classification techniques on the data-set. All coding is supplied to show how these ROC curves were produced. I apologise for asking a loaded question but I have been trying to solve these issues using different combinations of code for almost two weeks, so if anyone can help me, then thank you. The main issue is the Naive Bayes curve shows a perfect score of 1, which is obviously wrong, and I cannot solve how to incorporate the linear discriminant analysis curve into a single ROC plot for comparison with the coding supplied.


  1. 在 MASS软件包中执行线性判别分析(LDA)

  2. 在 kLAR中的朴素贝叶斯(NB)包

  3. rpart包中的分类树(CT)



目标




  • 单个ROC图显示了ROC曲线,用于比较每种分类技术,并结合了图例。

  • 为每种分类技术计算曲线下的面积


    1. 每种分类技术都是在不同的R包中执行的,但我无法将这些ROC曲线合并到一个图上。所有错误消息均显示在页面底部

    2. LDA和NB的ROC曲线看起来都是伪造的

    3. 我无法应用图例和体验错误消息



    我提供了这三种技术的编码,以便任何人都可以逐步评估我的逻辑



    线性判别分析



    I provided the coding for all three techniques so anyone can assess my logic in step-by-step process

    Linear Discriminant Analysis

       library(MASS)
       predictors<-as.matrix(LDA.scores[,2:13])
       response<-as.factor(LDA.scores[,1])
    
       #Perform LDA
    
       Family.lda<-lda(response~predictors, CV=TRUE)
       predict.Family <-predict(Family.lda)
       tab <- table(response, Family.lda$class)
    



    构造混淆矩阵以预测类



    Construct Confusion Matrix to predict classes

       conCV1 <- rbind(tab[1, ]/sum(tab[1, ]), tab[2, ]/sum(tab[2, ]))
       dimnames(conCV1) <- list(Actual = c("No", "Yes"), "Predicted (cv)"= c("No", "Yes"))
    
       print(round(conCV1, 3))
    



    绘制判别分数



    Plot discriminant scores

        library(lattice)
        windows(width=10, height=7)
        densityplot(~predict.Family$x, groups=LDA.scores$Family)
    



    计算混淆矩阵的功能



    Function to Calculate Confusion Matrices

     confusion <- function(actual, predicted, names = NULL, printit = TRUE, prior = NULL) {
      if (is.null(names))
      names <- levels(actual)
      tab <- table(actual, predicted)
      acctab <- t(apply(tab, 1, function(x) x/sum(x)))
      dimnames(acctab) <- list(Actual = names, "Predicted (cv)" = names)
      if (is.null(prior)) {
      relnum <- table(actual)
      prior <- relnum/sum(relnum)
      acc <- sum(tab[row(tab) == col(tab)])/sum(tab)
      }
     else {
      acc <- sum(prior * diag(acctab))
      names(prior) <- names
      }
      if (printit)
      print(round(c("Overall accuracy" = acc, "Prior frequency" = prior),
                    + 4))
      if (printit) {
      cat("\nConfusion matrix", "\n")
      print(round(acctab, 4))
      }
      invisible(acctab)
      }
    



    更改比例以创建训练和测试集(70:30)



    Changing the proportions to create a training and test set (70:30)

      prior <- c(0.7, 0.3)
      lda.70.30 <- lda(response~predictors, CV=TRUE, prior=prior)
      confusion(response, lda.70.30$class, prior = c(0.7, 0.3))
    



    创建ROC曲线的函数



    A function to create a ROC Curve

     truepos <- numeric(19)
     falsepos <- numeric(19)
     p1 <- (1:19)/20
     for (i in 1:19) {
     p <- p1[i]
     Family.ROC <- lda(response~predictors, CV = TRUE, prior = c(p,  1 - p))
    
     confmat <- confusion(LDA.scores$Family, Family.ROC$class, printit = FALSE)
     falsepos[i] <- confmat[1, 2]
     truepos[i] <- confmat[2, 2]
     }
    



    绘制ROC曲线



    Plot ROC curve

     windows(width=10, height=7)
     LDA.ROC<-plot(truepos~falsepos, type = "l", lwd=2,
                   xlab = "False     positive    rate (Specificity)",
                   ylab = "True positive rate (Sensitivity)" col ="green")
                   abline(a=0,b=1, col="red")
    



    图1



    Figure 1

      index<-1:nrow(LDA.scores)
      trainindex.LDA=sample(index, trunc(length(index)*0.70), replace=FALSE)
      LDA.70.trainset<-LDA.scores[trainindex,]
      LDA.30.testset<-LDA.scores[-trainindex,]
    



    具有70%训练集的树



    Grow Trees with 70 % training set

    #Grow Tree the tree with the 70 % training set
    
      library(rpart)
      tree.split3<-rpart(Family~., data=LDA.70.trainset3, method="class")
      summary(tree.split3)
      print(tree.split3)
      plot(tree.split3)
      text(tree.split3,use.n=T,digits=0)
      printcp(tree.split3)
    



    制作使用测试和训练集(70:30)进行分类树预测



    Make Classification Tree Predictions using the test and training set (70:30)

     res3=predict(tree.split3,newdata=LDA.30.testset3)
     res4=as.data.frame(res3)
    



    创建一个二进制文件系统(0或1)进行分类分组因子的二项分布



    Create a binary system (0 or 1) for a binomial distribution for the categorical grouping factor

     res4$actual2 = NA
     res4$actual2[res4$actual=="G8"]= 1
     res4$actual2[res4$actual=="V4"]= 0
    



    绘制ROC曲线



    Plotting the ROC curve

     roc_pred <- prediction(re4$Predicted.prob, res4$actual2)
     perf <- performance(roc_pred, "tpr", "fpr")
     plot(perf, col="blue", lwd=2)
     abline(0,1,col="grey")
    

    图2

     library(klaR)
     library(caret)
    



    生成测试和训练集70:30



    Generate the test and training set 70:30

     trainIndex <- createDataPartition(LDA.scores$Family, p=0.70, list=FALSE)
     sig.train=LDA.scores[trainIndex,]
     sig.test=LDA.scores[-trainIndex,]
    



    建立NB模型并在测试集上进行预测



    Build the NB model and make predictions on the test set

     sig.train$Family<-as.factor(sig.train$Family)
     sig.test$Family<-as.factor(sig.test$Family)
     nbmodel<-NaiveBayes(Family~., data=sig.train)
     prediction<-predict(nbmodel, sig.test[2:13])
     NB<-as.data.frame(prediction)
     colnames(NB)<-c("Family", "Actual", "Predicted")
    



    为分类因子的二项式分布创建二进制系统(0或1)



    Create a binary system (0 or 1) for a binomial distribution for the categorical factor

     NB$actual2 = NA
     NB$actual2[NB$Family=="V4"]=1
     NB$actual2[NB$Family=="G8"]=0
     NB2<-as.data.frame(NB)
    



    绘图ROC曲线-此曲线看起来可疑



    PLOT ROC CURVE - THIS CURVE LOOKS SUSPECT

     library(ROCR)
     windows(width=10, height=7)
     roc_pred.NB<- prediction(NB2$Predicted, NB2$actual2)
     perf.NB <- performance(roc_pred.NB, "tpr", "fpr")
     plot(perf.NB, col="orange", lwd=2)
     abline(0,1,col="grey")
    

    图3:这个ROC曲线很明显错误

    Figure 3: This ROC CURVE IS OBVIOUSLY WRONG

     windows(width=10, height=7)
     plot(fit.perf, col="blue", lwd=2); #CT
     plot(LDA.ROC, col="green", lwd=2, add=T); #LDA
     plot(perf.NB,lwd=2,col="orange", lwd=2, add=T);NB
     abline(0,1,col="red", lwd=2)
    



    错误消息



    Error messages

     Warning in min(x) : no non-missing arguments to min; returning Inf
     Warning in max(x) : no non-missing arguments to max; returning -Inf
     Warning in min(x) : no non-missing arguments to min; returning Inf
     Warning in max(x) : no non-missing arguments to max; returning -Inf
     Warning in plot.window(...) : "add" is not a graphical parameter
     Error in plot.window(...) : need finite 'xlim' values
     plot(fit.NB,lwd=2,col="orange", lwd=2, add=T); #NB
     Error in plot(fit.NB, lwd = 2, col = "orange", lwd = 2, add = T) :
     error in evaluating the argument 'x' in selecting a method for function
     Warning in plot.window(...) : "add" is not a graphical parameter
     Error in plot.window(...) : need finite 'xlim' values
    



    曲线下面积



    Area under the curve

      auc1<-performance(fit.pred,"auc")#CT
      auc2<-performance(fit.NB, "auc")#NB
    



    我不确定如何根据所提供的LDA代码计算曲线面积



    传奇的产生



    此代码会产生错误消息



    I AM UNSURE HOW TO CALCULATE THE AREA UNDER THE CURVE FOR THE LDA CODE SUPPLIED

    Production of the legend

    This code produces error messages

    legend(c('fit.pred',fit.NB','LDA.ROC'), col=c('blue',orange','green'),lwd=3)
    



    名为LDA.scores的数据



    Data named LDA.scores

           Family    Swimming Not.Swimming      Running  Not.Running
        1      v4 -0.48055680 -0.086292700 -0.157157188 -0.438809944
        2      v4  0.12600625 -0.074481895  0.057316151 -0.539013927
        3      v4  0.06823834 -0.056765686  0.064711783 -0.539013927
        4      v4  0.67480139 -0.050860283  0.153459372 -0.539013927
        5      v4  0.64591744 -0.050860283  0.072107416 -0.472211271
        6      v4  0.21265812 -0.068576492  0.057316151 -0.071395338
        7      v4 -0.01841352 -0.068576492 -0.053618335 -0.071395338
        8      v4  0.12600625  0.055436970  0.012942357  0.296019267
        9      v4 -0.22060120  0.114491000 -0.038827070  0.563229889
        10     v4  0.27042603 -0.021333268  0.049920519 -0.037994010
        11     v4  0.03935439 -0.044954880  0.012942357  0.195815284
        12     v4 -0.45167284  0.008193747 -0.075805232 -0.171599321
        13     v4 -0.04729748 -0.056765686  0.035129254 -0.305204632
        14     v4 -0.10506539  0.008193747 -0.046222702  0.062209973
        15     v4  0.09712230  0.037720761  0.109085578 -0.104796666
        16     v4 -0.07618143  0.014099150 -0.038827070  0.095611301
        17     v4  0.29930998  0.108585597  0.057316151  0.028808645
        18     v4  0.01047043 -0.074481895  0.020337989 -0.071395338
        19     v4 -0.24948516  0.002288344  0.035129254  0.329420595
        20     v4 -0.04729748  0.049531567  0.057316151  0.296019267
        21     v4 -0.01841352  0.043626164  0.005546724 -0.171599321
        22     v4 -0.19171725  0.049531567 -0.016640173 -0.071395338
        23     v4 -0.48055680  0.020004552 -0.142365923  0.596631217
        24     v4  0.01047043  0.008193747  0.220020063  0.062209973
        25     v4 -0.42278889  0.025909955 -0.149761556  0.028808645
        26     v4 -0.45167284  0.031815358 -0.134970291 -0.138197994
        27     v4 -0.30725307  0.049531567  0.042524886  0.095611301
        28     v4  0.24154207 -0.039049477  0.072107416 -0.104796666
        29     v4  1.45466817 -0.003617059  0.064711783  0.296019267
        30     v4 -0.01841352  0.002288344  0.020337989  0.028808645
        31     G8  0.38596185  0.084963985  0.049920519 -0.037994010
        32     G8  0.15489021 -0.080387298  0.020337989 -0.338605960
        33     G8 -0.04729748  0.067247776  0.138668107  0.129012629
        34     G8  0.27042603  0.031815358  0.049920519  0.195815284
        35     G8 -0.07618143  0.037720761  0.020337989 -0.037994010
        36     G8 -0.10506539  0.025909955 -0.083200864  0.396223251
        37     G8 -0.01841352  0.126301805 -0.024035805  0.362821923
        38     G8  0.01047043  0.031815358 -0.016640173 -0.138197994
        39     G8  0.06823834  0.037720761 -0.038827070  0.262617940
        40     G8 -0.16283329 -0.050860283 -0.038827070 -0.405408616
        41     G8 -0.01841352 -0.039049477  0.005546724 -0.205000649
        42     G8 -0.39390493 -0.003617059 -0.090596497  0.129012629
        43     G8 -0.04729748  0.008193747 -0.009244540  0.195815284
        44     G8  0.01047043 -0.039049477 -0.016640173 -0.205000649
        45     G8  0.01047043 -0.003617059 -0.075805232 -0.004592683
        46     G8  0.06823834  0.008193747 -0.090596497 -0.205000649
        47     G8 -0.04729748  0.014099150  0.012942357 -0.071395338
        48     G8 -0.22060120 -0.015427865 -0.075805232 -0.171599321
        49     G8 -0.16283329  0.020004552 -0.061013967 -0.104796666
        50     G8 -0.07618143  0.031815358 -0.038827070 -0.138197994
        51     G8 -0.22060120  0.020004552 -0.112783394 -0.104796666
        52     G8 -0.19171725 -0.033144074 -0.068409599 -0.071395338
        53     G8 -0.16283329 -0.039049477 -0.090596497 -0.104796666
        54     G8 -0.22060120 -0.009522462 -0.053618335 -0.037994010
        55     G8 -0.13394934 -0.003617059 -0.075805232 -0.004592683
        56     G8 -0.27836911 -0.044954880 -0.090596497 -0.238401977
        57     G8 -0.04729748 -0.050860283  0.064711783  0.028808645
        58     G8  0.01047043 -0.044954880  0.012942357 -0.305204632
        59     G8  0.12600625 -0.068576492  0.042524886 -0.305204632
        60     G8  0.06823834 -0.033144074 -0.061013967 -0.271803305
        61     G8  0.06823834 -0.027238671 -0.061013967 -0.037994010
        62     G8  0.32819394 -0.068576492  0.064711783 -0.372007288
        63     G8  0.32819394  0.014099150  0.175646269  0.095611301
        64     G8 -0.27836911  0.002288344 -0.068409599  0.195815284
        65     G8  0.18377416  0.025909955  0.027733621  0.162413956
        66     G8  0.55926557 -0.009522462  0.042524886  0.229216612
        67     G8 -0.19171725 -0.009522462 -0.038827070  0.229216612
        68     G8 -0.19171725  0.025909955 -0.009244540  0.396223251
        69     G8  0.01047043  0.155828820  0.027733621  0.630032545
        70     G8 -0.19171725  0.002288344 -0.031431438  0.463025906
        71     G8 -0.01841352 -0.044954880 -0.046222702  0.496427234
        72     G8 -0.07618143 -0.015427865 -0.031431438  0.062209973
        73     G8 -0.13394934  0.008193747 -0.068409599 -0.071395338
        74     G8 -0.39390493  0.037720761 -0.120179026  0.229216612
        75     G8 -0.04729748  0.008193747  0.035129254 -0.071395338
        76     G8 -0.27836911 -0.015427865 -0.061013967 -0.071395338
        77     G8  0.70368535 -0.056765686  0.397515240 -0.205000649
        78     G8  0.29930998  0.079058582  0.138668107  0.229216612
        79     G8 -0.13394934 -0.056765686  0.020337989 -0.305204632
        80     G8  0.21265812  0.025909955  0.035129254  0.396223251
           Family    Fighting Not.Fighting        Resting  Not.Resting
        1      v4 -0.67708172 -0.097624192  0.01081204879 -0.770462870
        2      v4 -0.58224128 -0.160103675 -0.03398160805  0.773856776
        3      v4 -0.11436177 -0.092996082  0.05710879700 -2.593072768
        4      v4 -0.34830152 -0.234153433 -0.04063432116 -2.837675606
        5      v4 -0.84568695 -0.136963126 -0.13084281035 -1.680828329
        6      v4 -0.32933343 -0.157789620 -0.02997847693 -0.947623773
        7      v4  0.35984044 -0.157789620  0.12732080268 -0.947623773
        8      v4 -0.32511830 -0.023574435 -0.10281705810 -2.607366431
        9      v4  1.51478626  0.001880170  0.08155320398 -0.637055341
        10     v4  0.11114773 -0.224897213 -0.17932134171 -1.818396455
        11     v4  0.27975296 -0.109194467 -0.14338902206  2.170944974
        12     v4 -0.89626852 -0.069855533 -0.02058415581 -0.658126752
        13     v4  0.12379312 -0.123078796 -0.11528274705 -0.808243774
        14     v4  0.66965255 -0.111508522 -0.11764091337  2.377766908
        15     v4  1.56536783 -0.143905291  0.04389156236  2.111220276
        16     v4  0.56427428 -0.099938247  0.01399844913 -0.322326312
        17     v4 -0.71291033 -0.118450687 -0.05755560242  2.218858946
        18     v4 -0.75927677  1.519900201  0.04711630687  3.920878638
        19     v4 -0.75295407  0.177748344  0.01584280360 -0.304945754
        20     v4 -1.00164679  0.108326696  0.09348590900  1.038591535
        21     v4 -1.03958296  0.652129604  0.09677967302  1.752268128
        22     v4  0.82139726  0.638245274  0.02053612974  0.907465624
        23     v4 -1.07541157 -0.072169588 -0.03608286844  1.137774798
        24     v4 -1.03115270  0.087500202  0.07805238146 -3.663486997
        25     v4 -0.98900139 -0.180930170 -0.00009686695  2.350924346
        26     v4 -1.06908888 -0.146219346 -0.02285413055  0.067293462
        27     v4 -1.20186549 -0.049029039 -0.00424187149 -1.898454393
        28     v4  0.58324237 -0.125392851  0.01446241356 -2.497647463
        29     v4 -0.97003330 -0.134649071  0.03187450017 -4.471716512
        30     v4  0.22917139 -0.060599313  0.11323315542 -1.465081244
        31     G8  0.41042201 -0.086053918 -0.01171898422 -0.232806371
        32     G8 -1.11545531 -0.197128554 -0.06499053655 -3.043893581
        33     G8 -0.19023412 -0.083739863 -0.07758659568 -2.323908986
        34     G8  0.25446217 -0.092996082 -0.07399758157  1.437404886
        35     G8 -0.05324237  0.844196163 -0.11503350996  1.079056696
        36     G8  0.09007207  0.055103433  0.02167111711  1.110865131
        37     G8  1.21129685  1.971140911  0.01904454162  1.404724068
        38     G8  0.62539368 -0.111508522  0.05768779393 -1.706664294
        39     G8  1.32932051 -0.224897213  0.05555202379  0.736746935
        40     G8  0.40199175 -0.187872334 -0.01031175326 -0.005516985
        41     G8  0.44625062 -0.160103675 -0.00458313459  1.727170333
        42     G8  0.60221046 -0.194814499  0.17430774591  1.685228831
        43     G8  0.33665722 -0.053657149  0.00481502094  1.836016918
        44     G8 -0.63493041 -0.206384774 -0.00928412956  0.466173920
        45     G8 -0.28296700  0.108326696  0.09047589183  1.697173771
        46     G8 -0.32722587 -0.164731785  0.08917985896  1.057314221
        47     G8 -0.11646933  0.187004564 -0.05671203072  0.933704227
        48     G8 -0.10171637  0.025020719 -0.05333390954  0.482480775
        49     G8  0.13643851  0.057417488  0.08541446168  0.680713089
        50     G8 -0.57802615  0.434608441  0.10140397965  0.090780703
        51     G8  0.05002833  0.057417488 -0.02509342995  0.680713089
        52     G8 -0.16072820  0.073615872 -0.03698779080 -0.982921741
        53     G8 -0.29139726 -0.035144709  0.04609635201 -2.281900378
        54     G8  0.13222338 -0.051343094  0.06524159499  0.972089090
        55     G8 -0.41152848 -0.134649071  0.08459773090  0.027767791
        56     G8  0.68229794 -0.185558279 -0.03239032508 -0.162881500
        57     G8 -0.24292325  0.013450444 -0.03208740616 -0.530221948
        58     G8 -0.11646933 -0.134649071  0.06264952925 -0.385741863
        59     G8 -0.21341734 -0.215640993  0.05241547086 -0.972251823
        60     G8 -0.24292325 -0.185558279 -0.03437271856  0.002267358
        61     G8 -0.24292325 -0.005061995 -0.03437271856 -1.134447998
        62     G8  0.09007207 -0.238781543 -0.06747523863  0.626424009
        63     G8 -0.34197883 -0.099938247 -0.01270059491 -0.722750217
        64     G8 -0.30825778 -0.167045840  0.10014629095 -0.382722075
        65     G8 -0.08696342 -0.208698829 -0.02872845706 -0.356550578
        66     G8 -0.81196590  0.048161268 -0.00950652573 -1.851614124
        67     G8  0.49683219  0.048161268  0.04867308008 -1.851614124
        68     G8 -0.13754498 -0.037458764  0.02486518629  1.731465143
        69     G8 -0.48318570  0.161549960 -0.05951115497  0.254319006
        70     G8  0.39988418  0.031962884 -0.02353665674  2.043778341
        71     G8  0.90148474 -0.102252302 -0.01967923345 -0.289913920
        72     G8  0.28396809 -0.123078796 -0.10148651548  1.386940871
        73     G8  1.05322945 -0.139277181 -0.00480936518  0.054207713
        74     G8  1.24923303 -0.208698829 -0.00098261723  0.594212936
        75     G8  0.47154141 -0.118450687 -0.13970798195  1.551821303
        76     G8  1.27873894 -0.072169588 -0.00286148145  3.100704184
        77     G8  0.05002833 -0.044400929 -0.05492902692  0.327263666
        78     G8  1.54218461 -0.030516599  0.10732815358 -1.055195336
        79     G8  0.74763247 -0.132335016  0.11660744219 -1.134447998
        80     G8  0.11747042 -0.037458764 -0.02016620439  1.730726972
           Family    Fighting      Hunting Not.Hunting     Grooming
        1      v4 -0.67708172  0.114961983   0.2644238  0.105443109
        2      v4 -0.58224128  0.556326739  -1.9467488 -0.249016133
        3      v4 -0.11436177  0.326951992   2.1597867 -0.563247851
        4      v4 -0.34830152  0.795734469   2.1698228 -0.611969290
        5      v4 -0.84568695  0.770046573   0.2554708 -0.230476117
        6      v4 -0.32933343  0.736574466   0.1225477 -0.270401826
        7      v4  0.35984044  0.215724268   0.1225477  1.057451389
        8      v4 -0.32511830 -0.200731013   0.2593696 -0.260830004
        9      v4  1.51478626 -2.160535836   0.8687508  1.030589923
        10     v4  0.11114773  0.660462182   1.7955299 -0.809959417
        11     v4  0.27975296 -0.293709087  -0.8377330 -0.292132450
        12     v4 -0.89626852  0.565754284   1.3339454 -0.573854465
        13     v4  0.12379312 -0.499644710  -0.5100101 -0.372285683
        14     v4  0.66965255  0.080624964  -2.6852985 -0.470590886
        15     v4  1.56536783 -4.076143639  -0.8432925  1.657328707
        16     v4  0.56427428 -0.127040484  -0.8662526 -0.161145079
        17     v4 -0.71291033  0.661240603  -2.1990933 -0.381900622
        18     v4 -0.75927677  0.294950237  -3.5062302 -0.121909231
        19     v4 -0.75295407  0.548369546  -1.3326746 -0.338568723
        20     v4 -1.00164679  0.137622686  -1.7580862 -0.312742050
        21     v4 -1.03958296  0.019302681  -2.2730277  0.708985315
        22     v4  0.82139726 -0.043057497  -3.1829838 -0.378408200
        23     v4 -1.07541157  0.351515502  -0.3762928 -0.304161903
        24     v4 -1.03115270 -0.007163636   1.3605877 -0.431053223
        25     v4 -0.98900139  0.253780410  -1.1388134 -0.554883286
        26     v4 -1.06908888  0.700680605   0.6629041  0.113074697
        27     v4 -1.20186549  0.340704098   0.9979915 -0.693545361
        28     v4  0.58324237 -1.727041782   1.5589254  0.180163686
        29     v4 -0.97003330  0.209410408   1.7613786 -0.258156792
        30     v4  0.22917139 -2.441026901   1.3929340  0.276959818
        31     G8  0.41042201  0.383257784  -0.5374467  0.165978418
        32     G8 -1.11545531 -1.098682982   2.9654839  0.148947473
        33     G8 -0.19023412  0.873144122   2.5120581 -0.846910101
        34     G8  0.25446217  0.968889915  -0.4130434 -0.938661624
        35     G8 -0.05324237  0.936455703  -2.5993065 -0.949914982
        36     G8  0.09007207 -0.467815937  -1.0766479  1.474170593
        37     G8  1.21129685 -1.239490708  -4.1335895  1.357023559
        38     G8  0.62539368  0.177235670   2.4989896  1.393241265
        39     G8  1.32932051 -4.736158229  -0.5718146  2.467225606
        40     G8  0.40199175  0.342693397   0.5675981  0.648320657
        41     G8  0.44625062  0.488950070  -1.6998195  0.709588943
        42     G8  0.60221046 -0.415575233  -1.4313741  0.728473890
        43     G8  0.33665722  0.353937257  -2.2985148  0.379706002
        44     G8 -0.63493041  0.262083568   0.2245685 -0.367629121
        45     G8 -0.28296700  0.574316915  -1.0020637  0.280710938
        46     G8 -0.32722587  0.323665326  -1.1559252  0.119455912
        47     G8 -0.11646933  0.786566398   0.1746772 -0.858206576
        48     G8 -0.10171637  0.718065343  -0.2673407 -0.552555005
        49     G8  0.13643851  0.584868846  -0.1203383 -0.335378116
        50     G8 -0.57802615 -0.053955393   0.6359729  0.057885811
        51     G8  0.05002833  0.738563765  -0.1203383 -0.188308359
        52     G8 -0.16072820  0.778263240   2.1906890 -0.545138998
        53     G8 -0.29139726  0.751018502   1.6039070  0.198100074
        54     G8  0.13222338  0.297804447  -0.5217068 -0.514310832
        55     G8 -0.41152848  0.102161281   0.3866610 -0.036323341
        56     G8  0.68229794  0.371667959   1.6179863 -0.176365139
        57     G8 -0.24292325  0.631574111   1.4206594 -0.269668849
        58     G8 -0.11646933 -0.004568899   1.6827511  0.003731717
        59     G8 -0.21341734  0.214080935   1.0590019  0.036586351
        60     G8 -0.24292325  0.796339908   1.2727184 -0.615289246
        61     G8 -0.24292325  0.796339908   2.6745838 -0.615289246
        62     G8  0.09007207 -0.396720145   0.2644238  0.290800156
        63     G8 -0.34197883  0.441985331   1.4545220 -0.520648930
        64     G8 -0.30825778 -2.489721464   1.3587105  1.711267220
        65     G8 -0.08696342  0.407907785   0.8136610 -0.273333736
        66     G8 -0.81196590  0.554423932   1.3666527 -0.594420949
        67     G8  0.49683219  0.697912886   1.3666527 -0.446661330
        68     G8 -0.13754498  0.491198842  -1.3307974 -0.333825929
        69     G8 -0.48318570  0.604848320  -0.1305910 -0.601492025
        70     G8  0.39988418  0.773938679  -0.5078441 -0.712559657
        71     G8  0.90148474  0.734412186  -0.1166561 -0.548803885
        72     G8  0.28396809  1.145505011  -1.3062489 -0.921846260
        73     G8  1.05322945  0.616784110   0.9039851 -0.165629176
        74     G8  1.24923303  0.329287256   0.3647117  0.111867440
        75     G8  0.47154141 -0.016764163  -1.1586689 -0.476713403
        76     G8  1.27873894  0.007799347  -3.0386529  0.215087903
        77     G8  0.05002833  0.209496900  -1.5080522  0.324560232
        78     G8  1.54218461 -5.031179821   1.6811626  2.366893936
        79     G8  0.74763247 -0.325105405   1.6851337  1.351590903
        80     G8  0.11747042 -0.756350687  -1.3315194  0.375911766
    
           Family Not.Grooming
        1      v4  0.019502286
        2      v4 -0.290451956
        3      v4  0.359948884
        4      v4  0.557840914
        5      v4  0.117453376
        6      v4  0.126645924
        7      v4  0.126645924
        8      v4  0.196486873
        9      v4  0.152780228
        10     v4  0.354469789
        11     v4 -0.261430968
        12     v4  0.176448238
        13     v4 -0.007374708
        14     v4 -0.557848621
        15     v4 -0.213674557
        16     v4 -0.005819262
        17     v4 -0.470070992
        18     v4 -0.786078864
        19     v4  0.006063789
        20     v4 -0.271842650
        21     v4 -0.349418792
        22     v4 -0.338096262
        23     v4 -0.165119403
        24     v4  0.346566439
        25     v4 -0.344191931
        26     v4  0.074321265
        27     v4  0.179825379
        28     v4  0.278407054
        29     v4  0.593125727
        30     v4  0.199177375
        31     G8 -0.058900625
        32     G8  0.633875622
        33     G8  0.428150308
        34     G8 -0.206023441
        35     G8 -0.436958199
        36     G8 -0.291839246
        37     G8 -0.907641911
        38     G8  0.448567295
        39     G8 -0.127186127
        40     G8  0.024715134
        41     G8 -0.416345030
        42     G8 -0.330697382
        43     G8 -0.469720666
        44     G8 -0.047494017
        45     G8 -0.301732446
        46     G8 -0.138901021
        47     G8  0.098101379
        48     G8 -0.002063769
        49     G8 -0.028324190
        50     G8  0.071630763
        51     G8 -0.028324190
        52     G8  0.295110588
        53     G8  0.347112947
        54     G8 -0.083577573
        55     G8 -0.036886152
        56     G8  0.189045953
        57     G8  0.467596992
        58     G8  0.303378276
        59     G8  0.218879697
        60     G8  0.092005711
        61     G8  0.270111340
        62     G8 -0.012909856
        63     G8  0.262292068
        64     G8  0.107125772
        65     G8  0.123422927
        66     G8  0.299426602
        67     G8  0.299426602
        68     G8 -0.326871824
        69     G8 -0.022088391
        70     G8 -0.428508341
        71     G8 -0.014675497
        72     G8 -0.114462294
        73     G8  0.087227267
        74     G8 -0.031519161
        75     G8 -0.159318008
        76     G8 -0.397875854
        77     G8  0.101520559
        78     G8  0.244481505
        79     G8  0.529968994
        80     G8 -0.326619590
    


    推荐答案

    首先,在将数据分组为训练和测试子集之前,最重要的问题之一是将数据分组,然后将数据随机化,否则在训练和测试数据子集中,您的类别将不平等地划分。

    First of all, one of the most important issues regarding subsetting your data into training and testing subsets is prior to subsetting, the data have to be randomized otherwise you will have unequal division of your categories in the training and testing data subsets.

    以下代码中的一些注释。为了简化模型拟合方法,我使用了插入符包。

    Some notes on the code below. I have used the caret package for simplicity for the model fitting methods.

    library(pROC)
    library(MASS)
    library(caret)
    
    set.seed(1234)
    
    mydat <- read.table("~/Desktop/family.txt", header = TRUE, stringsAsFactors= FALSE)
    mydat$Family <- factor(mydat$Family, levels = c("v4", "G8"))
    
    # Randomly permute the data before subsetting
    mydat_idx <- sample(1:nrow(mydat), replace = FALSE)
    mydat <- mydat[mydat_idx, ]
    
    mydat_resampled_idx <- createDataPartition(mydat_idx, times = 1, p = 0.7, list = FALSE)
    mydat_resampled <- mydat[mydat_resampled_idx, ] # Training portion of the data
    

    Linear discriminant analysis

    lda_mod <-train(x = mydat_resampled[, 2:9], y = as.factor(mydat_resampled[, 1]),
      method = "lda", trControl = trainControl(method = "cv", classProbs = TRUE))
    
    # Generate model predictions
    lda_pred <- predict(lda_mod, newdata = mydat[ , 2:9], type = "prob")
    
    # Store the predictions with the data set
    mydat['lda_pred'] <- lda_pred["G8"] # Here we only want the probability associated
                                        # with the class (Y = 1), or in this case, G8
    

    Naive Bayes

    nb_tune <- data.frame(usekernel =TRUE, fL = 0)
    nb_mod <- train(x = mydat_resampled[, 2:9], y = as.factor(mydat_resampled[, 1]),
        method = "nb", trControl = trainControl(method = "cv", classProbs = TRUE), tuneGrid = nb_tune)
    
    # Model predictions
    nb_pred <- predict(nb_mod, newdata = mydat[ , 2:9], type = "prob")
    mydat['nb_pred'] <- nb_pred["G8"]
    

    Classification tree

    ct_mod <- train(x = mydat_resampled[, 2:9], y = as.factor(mydat_resampled[, 1]),
    method = "rpart", trControl = trainControl(method = "cv", classProbs = TRUE))
    ct_pred <- predict(ct_mod, newdata = mydat[ , 2:9], type = "prob")
    mydat['ct_pred'] <- ct_pred["G8"]
    

    ROC curves on the training and testing portions of the data

    Changed the calculation and plots of AUC curves to use the pROC package

    mydat$binary_response <- as.numeric(mydat$Family) - 1 # convert factor to 0, 1 with G8  = 1
    
    lda_train_roc <- roc(binary_response ~ lda_pred, data = mydat[mydat_resampled_idx, ], ci = TRUE)
    nb_train_roc <- roc(binary_response ~ nb_pred, data =  mydat[mydat_resampled_idx, ], ci = TRUE)
    ct_train_roc <- roc(binary_response ~ ct_pred, data =  mydat[mydat_resampled_idx, ], ci = TRUE)
    
    lda_test_roc <- roc(binary_response ~ lda_pred, data =  mydat[-mydat_resampled_idx, ], ci = TRUE)
    nb_test_roc <- roc(binary_response ~ nb_pred, data =  mydat[-mydat_resampled_idx, ], ci = TRUE)
    ct_test_roc <- roc(binary_response ~ ct_pred, data =  mydat[-mydat_resampled_idx, ], ci = TRUE)
    
    
    par(mfrow = c(2, 1))
    plot(lda_train_roc, las = 1, main = "Training data")
    plot(nb_train_roc, add = TRUE, col = "red")
    plot(ct_train_roc, add = TRUE, col = "blue")
    legend(0.4, 0.4, legend = c("lda", "nb", "ct"), lty = c(1,1,1), col = c("black", "red", "blue"))
    
    plot(lda_test_roc, las = 1, main = "Testing data")
    plot(nb_test_roc, add = TRUE, col = "red")
    plot(ct_test_roc, add = TRUE, col = "blue")
    legend(0.4, 0.4, legend = c("lda", "nb", "ct"), lty = c(1,1,1), col =     c("black", "red", "blue"))
    par(mfrow = c(1, 1))
    
    # AUC with 95% CL
    lda_train_roc$ci[c(2, 1, 3)] #  0.8353741 0.7235472 0.9472011
    
    nb_train_roc$ci[c(2, 1, 3)] # 0.9714286 [0.9303684, 1.0000000]
    
    ct_train_roc$ci[c(2, 1, 3)] # 0.7619048 [0.6524637, 0.8713458]
    lda_test_roc$ci[c(2, 1, 3)] # 0.6148148 [0.3555396, 0.8740900]
    
    nb_test_roc$ci[c(2, 1, 3)] # 0.7407407 [0.5345984, 0.9468831]
    ct_test_roc$ci[c(2, 1, 3)] # 0.6000000 [0.4139795, 0.7860205]
    

    这篇关于在单个ROC图上绘制线性判别分析,分类树和朴素贝叶斯曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:22