我正在使用R caret包来生成模型。我在预处理中使用PCA进行降维,然后尝试生成逻辑回归模型。

我收到此错误:
contrasts<-中的错误(*tmp*,值= contr.funs [1 + isOF [nn]]):对比只能应用于2级或更多级的因子

    credit <- read.csv('~Loans Question/RequiredAttributesWithLoanStatus.csv')

    credit$LoanStatus <- as.factor(credit$LoanStatus)

    str(credit)
    'data.frame':   8580 obs. of  45 variables:
     $ ListingCategory            : int  1 7 3 1 1 7 1 1 1 1 ...
     $ IncomeRange                : int  3 4 6 4 4 3 3 4 3 3 ...
     $ StatedMonthlyIncome        : num  2583 4326 10500 4167 5667 ...
     $ IncomeVerifiable           : logi  TRUE TRUE TRUE FALSE TRUE TRUE ...
     $ DTIwProsperLoan            : num  1.8e-01 2.0e-01 1.7e-01 1.0e+06 1.8e-01 4.4e-01 2.2e-01 2.0e-01 2.0e-01 3.1e-01 ...
     $ EmploymentStatusDescription: Factor w/ 7 levels "Employed","Full-time",..: 1 4 1 7 1 1 1 1 1 1 ...
     $ Occupation                 : Factor w/ 65 levels "","Accountant/CPA",..: 37 37 20 14 43 58 48 37 37 37 ...
     $ MonthsEmployed             : int  4 44 159 67 26 16 209 147 24 9 ...
     $ BorrowerState              : Factor w/ 48 levels "AK","AL","AR",..: 22 32 5 5 14 28 4 10 10 34 ...
     $ BorrowerCity               : Factor w/ 3089 levels "AARONSBURG","ABERDEEN",..: 1737 3059 2488 654 482 719 895 1699 2747 1903 ...
     $ BorrowerMetropolitanArea   : Factor w/ 1 level "(Not Implemented)": 1 1 1 1 1 1 1 1 1 1 ...
     $ LenderIndicator            : int  0 0 0 1 0 0 0 0 1 0 ...
     $ GroupIndicator             : logi  FALSE FALSE FALSE TRUE FALSE FALSE ...
     $ GroupName                  : Factor w/ 83 levels "","00 Used Car Loans",..: 1 1 1 47 1 1 1 1 1 1 ...
     $ ChannelCode                : int  90000 90000 90000 80000 40000 40000 90000 90000 80000 90000 ...
     $ AmountParticipation        : int  0 0 0 0 0 0 0 0 0 0 ...
     $ MonthlyDebt                : int  247 785 1631 817 644 1524 427 817 654 749 ...
     $ CurrentDelinquencies       : int  0 0 0 0 0 0 0 1 0 1 ...
     $ DelinquenciesLast7Years    : int  0 10 0 0 0 0 0 0 0 0 ...
     $ PublicRecordsLast10Years   : int  0 1 0 0 0 0 1 0 1 0 ...
     $ PublicRecordsLast12Months  : int  0 0 0 0 0 0 0 0 0 0 ...
     $ FirstRecordedCreditLine    : Factor w/ 4719 levels "1/1/00 0:00",..: 3032 2673 1197 2541 4698 4345 3150 925 4452 2358 ...
     $ CreditLinesLast7Years      : int  53 30 36 26 7 22 15 20 34 32 ...
     $ InquiriesLast6Months       : int  2 8 5 0 0 0 0 3 0 0 ...
     $ AmountDelinquent           : int  0 0 0 0 0 0 0 63 0 15 ...
     $ CurrentCreditLines         : int  10 10 18 10 4 11 6 10 7 8 ...
     $ OpenCreditLines            : int  9 10 15 8 3 8 5 7 7 8 ...
     $ BankcardUtilization        : num  0.26 0.69 0.94 0.69 0.81 0.38 0.55 0.24 0.03 0 ...
     $ TotalOpenRevolvingAccounts : int  9 7 12 10 3 5 4 5 4 6 ...
     $ InstallmentBalance         : int  48648 14827 0 0 0 30916 0 21619 41340 15447 ...
     $ RealEstateBalance          : int  0 0 577745 0 0 0 191296 0 0 126039 ...
     $ RevolvingBalance           : int  5265 9967 94966 50511 37871 22463 19550 2436 1223 3236 ...
     $ RealEstatePayment          : int  0 0 4159 0 0 0 1303 0 0 1279 ...
     $ RevolvingAvailablePercent  : int  78 52 36 45 18 61 44 74 96 76 ...
     $ TotalInquiries             : int  8 11 15 2 0 0 1 7 1 1 ...
     $ TotalTradeItems            : int  53 30 36 26 7 22 15 20 34 32 ...
     $ SatisfactoryAccounts       : int  52 23 36 26 7 19 15 18 34 29 ...
     $ NowDelinquentDerog         : int  0 0 0 0 0 0 0 1 0 1 ...
     $ WasDelinquentDerog         : int  1 7 0 0 0 3 0 1 0 2 ...
     $ OldestTradeOpenDate        : int  5092001 5011977 12011984 4272000 9081993 9122000 6161987 11181999 9191990 4132000 ...
     $ DelinquenciesOver30Days    : int  0 6 0 0 0 13 0 2 0 2 ...
     $ DelinquenciesOver60Days    : int  0 4 0 0 0 0 0 0 0 1 ...
     $ DelinquenciesOver90Days    : int  0 10 0 0 0 0 0 0 0 0 ...
     $ IsHomeowner                : logi  FALSE FALSE TRUE FALSE FALSE FALSE ...
     $ LoanStatus                 : Factor w/ 4 levels "1","2","3","4": 4 2 2 4 4 4 4 4 4 3 ...

    summary(credit)
    ListingCategory   IncomeRange    StatedMonthlyIncome IncomeVerifiable
     Min.   : 0.000   Min.   :1.000   Min.   :     0      Mode :logical
     1st Qu.: 1.000   1st Qu.:3.000   1st Qu.:  3167      FALSE:784
     Median : 2.000   Median :4.000   Median :  4750      TRUE :7796
     Mean   : 4.997   Mean   :4.089   Mean   :  5755      NA's :0
     3rd Qu.: 7.000   3rd Qu.:5.000   3rd Qu.:  7083
     Max.   :20.000   Max.   :7.000   Max.   :250000

     DTIwProsperLoan     EmploymentStatusDescription
     Min.   :      0.0   Employed     :7182
     1st Qu.:      0.1   Full-time    : 416
     Median :      0.2   Not employed : 122
     Mean   :  91609.4   Other        : 475
     3rd Qu.:      0.3   Part-time    :   7
     Max.   :1000000.0   Retired      :  32
                         Self-employed: 346
                        Occupation   MonthsEmployed   BorrowerState
     Other                   :2421   Min.   :-23.00   CA     :1056
     Professional            :1040   1st Qu.: 26.00   FL     : 608
     Computer Programmer     : 345   Median : 68.00   NY     : 574
     Executive               : 334   Mean   : 97.44   TX     : 532
     Administrative Assistant: 325   3rd Qu.:139.00   IL     : 443
     Teacher                 : 301   Max.   :755.00   GA     : 343
     (Other)                 :3814   NA's   :5        (Other):5024
        BorrowerCity       BorrowerMetropolitanArea LenderIndicator
     CHICAGO  : 121   (Not Implemented):8580        Min.   :0.00000
     NEW YORK :  91                                 1st Qu.:0.00000
     BROOKLYN :  88                                 Median :0.00000
     HOUSTON  :  64                                 Mean   :0.09196
     LAS VEGAS:  53                                 3rd Qu.:0.00000
     ATLANTA  :  51                                 Max.   :1.00000
     (Other)  :8112
     GroupIndicator                                     GroupName
     Mode :logical                                           :8326
     FALSE:8325      We do not accept new membership requests:  39
     TRUE :255       BORROWERS - LARGEST GROUP               :  29
     NA's :0         LendersClub                             :  17
                     Debt Consolidators                      :  12
                     Have Money - Will Bid                   :  10
                     (Other)                                 : 147
      ChannelCode    AmountParticipation  MonthlyDebt      CurrentDelinquencies
     Min.   :40000   Min.   :0           Min.   :    0.0   Min.   : 0.0000
     1st Qu.:80000   1st Qu.:0           1st Qu.:  364.0   1st Qu.: 0.0000
     Median :80000   Median :0           Median :  708.0   Median : 0.0000
     Mean   :77196   Mean   :0           Mean   :  885.5   Mean   : 0.4119
     3rd Qu.:90000   3rd Qu.:0           3rd Qu.: 1205.2   3rd Qu.: 0.0000
     Max.   :90000   Max.   :0           Max.   :30213.0   Max.   :21.0000

     DelinquenciesLast7Years PublicRecordsLast10Years PublicRecordsLast12Months
     Min.   : 0.000          Min.   : 0.0000          Min.   :0.00000
     1st Qu.: 0.000          1st Qu.: 0.0000          1st Qu.:0.00000
     Median : 0.000          Median : 0.0000          Median :0.00000
     Mean   : 4.009          Mean   : 0.2809          Mean   :0.01364
     3rd Qu.: 3.000          3rd Qu.: 0.0000          3rd Qu.:0.00000
     Max.   :99.000          Max.   :11.0000          Max.   :4.00000

     FirstRecordedCreditLine CreditLinesLast7Years InquiriesLast6Months
     12/1/93 0:00:  20       Min.   :  2.0         Min.   : 0.0000
     3/1/95 0:00 :  19       1st Qu.: 16.0         1st Qu.: 0.0000
     6/1/90 0:00 :  17       Median : 24.0         Median : 1.0000
     6/1/89 0:00 :  16       Mean   : 26.1         Mean   : 0.9994
     12/1/90 0:00:  15       3rd Qu.: 34.0         3rd Qu.: 1.0000
     2/1/94 0:00 :  14       Max.   :115.0         Max.   :15.0000
     (Other)     :8479
     AmountDelinquent CurrentCreditLines OpenCreditLines  BankcardUtilization
     Min.   :     0   Min.   : 0.000     Min.   : 0.000   Min.   :0.0000
     1st Qu.:     0   1st Qu.: 5.000     1st Qu.: 5.000   1st Qu.:0.2500
     Median :     0   Median : 9.000     Median : 8.000   Median :0.5400
     Mean   :  1195   Mean   : 9.345     Mean   : 8.306   Mean   :0.5182
     3rd Qu.:     0   3rd Qu.:12.000     3rd Qu.:11.000   3rd Qu.:0.7900
     Max.   :179158   Max.   :54.000     Max.   :42.000   Max.   :2.2300

     TotalOpenRevolvingAccounts InstallmentBalance RealEstateBalance
     Min.   : 0.000             Min.   :     0     Min.   :      0
     1st Qu.: 3.000             1st Qu.:  3338     1st Qu.:      0
     Median : 6.000             Median : 14453     Median :  26154
     Mean   : 6.441             Mean   : 24900     Mean   : 109306
     3rd Qu.: 9.000             3rd Qu.: 32238     3rd Qu.: 176542
     Max.   :44.000             Max.   :739371     Max.   :1938421
                                NA's   :328
     RevolvingBalance RealEstatePayment RevolvingAvailablePercent TotalInquiries
     Min.   :     0   Min.   :    0.0   Min.   :  0.00            Min.   : 0.00
     1st Qu.:  2799   1st Qu.:    0.0   1st Qu.: 29.00            1st Qu.: 2.00
     Median :  8784   Median :  346.5   Median : 52.00            Median : 3.00
     Mean   : 19555   Mean   :  830.5   Mean   : 51.46            Mean   : 3.91
     3rd Qu.: 21110   3rd Qu.: 1382.2   3rd Qu.: 75.00            3rd Qu.: 5.00
     Max.   :695648   Max.   :13651.0   Max.   :100.00            Max.   :36.00

     TotalTradeItems SatisfactoryAccounts NowDelinquentDerog WasDelinquentDerog
     Min.   :  2.0   Min.   :  1.00       Min.   : 0.0000    Min.   : 0.000
     1st Qu.: 16.0   1st Qu.: 14.00       1st Qu.: 0.0000    1st Qu.: 0.000
     Median : 24.0   Median : 21.00       Median : 0.0000    Median : 1.000
     Mean   : 26.1   Mean   : 23.34       Mean   : 0.4119    Mean   : 2.343
     3rd Qu.: 34.0   3rd Qu.: 30.25       3rd Qu.: 0.0000    3rd Qu.: 3.000
     Max.   :115.0   Max.   :113.00       Max.   :21.0000    Max.   :32.000

     OldestTradeOpenDate DelinquenciesOver30Days DelinquenciesOver60Days
     Min.   : 1011957    Min.   : 0.000          Min.   : 0.000
     1st Qu.: 4101996    1st Qu.: 0.000          1st Qu.: 0.000
     Median : 7191993    Median : 1.000          Median : 0.000
     Mean   : 6934230    Mean   : 4.332          Mean   : 1.908
     3rd Qu.:10011990    3rd Qu.: 5.000          3rd Qu.: 2.000
     Max.   :12312004    Max.   :99.000          Max.   :73.000

     DelinquenciesOver90Days IsHomeowner     LoanStatus
     Min.   : 0.000          Mode :logical   1:1847
     1st Qu.: 0.000          FALSE:4264      2:1262
     Median : 0.000          TRUE :4316      3: 256
     Mean   : 4.009          NA's :0         4:5215
     3rd Qu.: 3.000
     Max.   :99.000

    try(na.fail(credit))

    glmFit <- train(LoanStatus~., credit, method = "glm", family=binomial, preProcess=c("pca"),
        trControl = trainControl(method = "cv"))
contrasts<-中的错误(*tmp*,值= contr.funs [1 + isOF [nn]]):对比只能应用于2级或更多级的因子
logregFit <- train(LoanStatus~., credit, method = "logreg", family=binomial, preProcess=c("pca"),
    trControl = trainControl(method = "cv"))
contrasts<-中的错误(*tmp*,值= contr.funs [1 + isOF [nn]]):对比只能应用于2级或更多级的因子

最佳答案

查看错误消息和数据集的变量,变量BorrowerMetropolitanArea仅具有一个级别(实际上,如果所有样本都具有相同的值,则根本没有预测值)。我猜这是使用PCA预处理数据集时在contrasts函数中引起的问题。

尝试在不使用train变量的情况下调用数据集上的BorrowerMetropolitanArea函数。

10-07 22:50