我正在使用R caret包来生成模型。我在预处理中使用PCA进行降维,然后尝试生成逻辑回归模型。
我收到此错误:contrasts<-
中的错误(*tmp*
,值= contr.funs [1 + isOF [nn]]):对比只能应用于2级或更多级的因子
credit <- read.csv('~Loans Question/RequiredAttributesWithLoanStatus.csv')
credit$LoanStatus <- as.factor(credit$LoanStatus)
str(credit)
'data.frame': 8580 obs. of 45 variables:
$ ListingCategory : int 1 7 3 1 1 7 1 1 1 1 ...
$ IncomeRange : int 3 4 6 4 4 3 3 4 3 3 ...
$ StatedMonthlyIncome : num 2583 4326 10500 4167 5667 ...
$ IncomeVerifiable : logi TRUE TRUE TRUE FALSE TRUE TRUE ...
$ DTIwProsperLoan : num 1.8e-01 2.0e-01 1.7e-01 1.0e+06 1.8e-01 4.4e-01 2.2e-01 2.0e-01 2.0e-01 3.1e-01 ...
$ EmploymentStatusDescription: Factor w/ 7 levels "Employed","Full-time",..: 1 4 1 7 1 1 1 1 1 1 ...
$ Occupation : Factor w/ 65 levels "","Accountant/CPA",..: 37 37 20 14 43 58 48 37 37 37 ...
$ MonthsEmployed : int 4 44 159 67 26 16 209 147 24 9 ...
$ BorrowerState : Factor w/ 48 levels "AK","AL","AR",..: 22 32 5 5 14 28 4 10 10 34 ...
$ BorrowerCity : Factor w/ 3089 levels "AARONSBURG","ABERDEEN",..: 1737 3059 2488 654 482 719 895 1699 2747 1903 ...
$ BorrowerMetropolitanArea : Factor w/ 1 level "(Not Implemented)": 1 1 1 1 1 1 1 1 1 1 ...
$ LenderIndicator : int 0 0 0 1 0 0 0 0 1 0 ...
$ GroupIndicator : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
$ GroupName : Factor w/ 83 levels "","00 Used Car Loans",..: 1 1 1 47 1 1 1 1 1 1 ...
$ ChannelCode : int 90000 90000 90000 80000 40000 40000 90000 90000 80000 90000 ...
$ AmountParticipation : int 0 0 0 0 0 0 0 0 0 0 ...
$ MonthlyDebt : int 247 785 1631 817 644 1524 427 817 654 749 ...
$ CurrentDelinquencies : int 0 0 0 0 0 0 0 1 0 1 ...
$ DelinquenciesLast7Years : int 0 10 0 0 0 0 0 0 0 0 ...
$ PublicRecordsLast10Years : int 0 1 0 0 0 0 1 0 1 0 ...
$ PublicRecordsLast12Months : int 0 0 0 0 0 0 0 0 0 0 ...
$ FirstRecordedCreditLine : Factor w/ 4719 levels "1/1/00 0:00",..: 3032 2673 1197 2541 4698 4345 3150 925 4452 2358 ...
$ CreditLinesLast7Years : int 53 30 36 26 7 22 15 20 34 32 ...
$ InquiriesLast6Months : int 2 8 5 0 0 0 0 3 0 0 ...
$ AmountDelinquent : int 0 0 0 0 0 0 0 63 0 15 ...
$ CurrentCreditLines : int 10 10 18 10 4 11 6 10 7 8 ...
$ OpenCreditLines : int 9 10 15 8 3 8 5 7 7 8 ...
$ BankcardUtilization : num 0.26 0.69 0.94 0.69 0.81 0.38 0.55 0.24 0.03 0 ...
$ TotalOpenRevolvingAccounts : int 9 7 12 10 3 5 4 5 4 6 ...
$ InstallmentBalance : int 48648 14827 0 0 0 30916 0 21619 41340 15447 ...
$ RealEstateBalance : int 0 0 577745 0 0 0 191296 0 0 126039 ...
$ RevolvingBalance : int 5265 9967 94966 50511 37871 22463 19550 2436 1223 3236 ...
$ RealEstatePayment : int 0 0 4159 0 0 0 1303 0 0 1279 ...
$ RevolvingAvailablePercent : int 78 52 36 45 18 61 44 74 96 76 ...
$ TotalInquiries : int 8 11 15 2 0 0 1 7 1 1 ...
$ TotalTradeItems : int 53 30 36 26 7 22 15 20 34 32 ...
$ SatisfactoryAccounts : int 52 23 36 26 7 19 15 18 34 29 ...
$ NowDelinquentDerog : int 0 0 0 0 0 0 0 1 0 1 ...
$ WasDelinquentDerog : int 1 7 0 0 0 3 0 1 0 2 ...
$ OldestTradeOpenDate : int 5092001 5011977 12011984 4272000 9081993 9122000 6161987 11181999 9191990 4132000 ...
$ DelinquenciesOver30Days : int 0 6 0 0 0 13 0 2 0 2 ...
$ DelinquenciesOver60Days : int 0 4 0 0 0 0 0 0 0 1 ...
$ DelinquenciesOver90Days : int 0 10 0 0 0 0 0 0 0 0 ...
$ IsHomeowner : logi FALSE FALSE TRUE FALSE FALSE FALSE ...
$ LoanStatus : Factor w/ 4 levels "1","2","3","4": 4 2 2 4 4 4 4 4 4 3 ...
summary(credit)
ListingCategory IncomeRange StatedMonthlyIncome IncomeVerifiable
Min. : 0.000 Min. :1.000 Min. : 0 Mode :logical
1st Qu.: 1.000 1st Qu.:3.000 1st Qu.: 3167 FALSE:784
Median : 2.000 Median :4.000 Median : 4750 TRUE :7796
Mean : 4.997 Mean :4.089 Mean : 5755 NA's :0
3rd Qu.: 7.000 3rd Qu.:5.000 3rd Qu.: 7083
Max. :20.000 Max. :7.000 Max. :250000
DTIwProsperLoan EmploymentStatusDescription
Min. : 0.0 Employed :7182
1st Qu.: 0.1 Full-time : 416
Median : 0.2 Not employed : 122
Mean : 91609.4 Other : 475
3rd Qu.: 0.3 Part-time : 7
Max. :1000000.0 Retired : 32
Self-employed: 346
Occupation MonthsEmployed BorrowerState
Other :2421 Min. :-23.00 CA :1056
Professional :1040 1st Qu.: 26.00 FL : 608
Computer Programmer : 345 Median : 68.00 NY : 574
Executive : 334 Mean : 97.44 TX : 532
Administrative Assistant: 325 3rd Qu.:139.00 IL : 443
Teacher : 301 Max. :755.00 GA : 343
(Other) :3814 NA's :5 (Other):5024
BorrowerCity BorrowerMetropolitanArea LenderIndicator
CHICAGO : 121 (Not Implemented):8580 Min. :0.00000
NEW YORK : 91 1st Qu.:0.00000
BROOKLYN : 88 Median :0.00000
HOUSTON : 64 Mean :0.09196
LAS VEGAS: 53 3rd Qu.:0.00000
ATLANTA : 51 Max. :1.00000
(Other) :8112
GroupIndicator GroupName
Mode :logical :8326
FALSE:8325 We do not accept new membership requests: 39
TRUE :255 BORROWERS - LARGEST GROUP : 29
NA's :0 LendersClub : 17
Debt Consolidators : 12
Have Money - Will Bid : 10
(Other) : 147
ChannelCode AmountParticipation MonthlyDebt CurrentDelinquencies
Min. :40000 Min. :0 Min. : 0.0 Min. : 0.0000
1st Qu.:80000 1st Qu.:0 1st Qu.: 364.0 1st Qu.: 0.0000
Median :80000 Median :0 Median : 708.0 Median : 0.0000
Mean :77196 Mean :0 Mean : 885.5 Mean : 0.4119
3rd Qu.:90000 3rd Qu.:0 3rd Qu.: 1205.2 3rd Qu.: 0.0000
Max. :90000 Max. :0 Max. :30213.0 Max. :21.0000
DelinquenciesLast7Years PublicRecordsLast10Years PublicRecordsLast12Months
Min. : 0.000 Min. : 0.0000 Min. :0.00000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:0.00000
Median : 0.000 Median : 0.0000 Median :0.00000
Mean : 4.009 Mean : 0.2809 Mean :0.01364
3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.:0.00000
Max. :99.000 Max. :11.0000 Max. :4.00000
FirstRecordedCreditLine CreditLinesLast7Years InquiriesLast6Months
12/1/93 0:00: 20 Min. : 2.0 Min. : 0.0000
3/1/95 0:00 : 19 1st Qu.: 16.0 1st Qu.: 0.0000
6/1/90 0:00 : 17 Median : 24.0 Median : 1.0000
6/1/89 0:00 : 16 Mean : 26.1 Mean : 0.9994
12/1/90 0:00: 15 3rd Qu.: 34.0 3rd Qu.: 1.0000
2/1/94 0:00 : 14 Max. :115.0 Max. :15.0000
(Other) :8479
AmountDelinquent CurrentCreditLines OpenCreditLines BankcardUtilization
Min. : 0 Min. : 0.000 Min. : 0.000 Min. :0.0000
1st Qu.: 0 1st Qu.: 5.000 1st Qu.: 5.000 1st Qu.:0.2500
Median : 0 Median : 9.000 Median : 8.000 Median :0.5400
Mean : 1195 Mean : 9.345 Mean : 8.306 Mean :0.5182
3rd Qu.: 0 3rd Qu.:12.000 3rd Qu.:11.000 3rd Qu.:0.7900
Max. :179158 Max. :54.000 Max. :42.000 Max. :2.2300
TotalOpenRevolvingAccounts InstallmentBalance RealEstateBalance
Min. : 0.000 Min. : 0 Min. : 0
1st Qu.: 3.000 1st Qu.: 3338 1st Qu.: 0
Median : 6.000 Median : 14453 Median : 26154
Mean : 6.441 Mean : 24900 Mean : 109306
3rd Qu.: 9.000 3rd Qu.: 32238 3rd Qu.: 176542
Max. :44.000 Max. :739371 Max. :1938421
NA's :328
RevolvingBalance RealEstatePayment RevolvingAvailablePercent TotalInquiries
Min. : 0 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.: 2799 1st Qu.: 0.0 1st Qu.: 29.00 1st Qu.: 2.00
Median : 8784 Median : 346.5 Median : 52.00 Median : 3.00
Mean : 19555 Mean : 830.5 Mean : 51.46 Mean : 3.91
3rd Qu.: 21110 3rd Qu.: 1382.2 3rd Qu.: 75.00 3rd Qu.: 5.00
Max. :695648 Max. :13651.0 Max. :100.00 Max. :36.00
TotalTradeItems SatisfactoryAccounts NowDelinquentDerog WasDelinquentDerog
Min. : 2.0 Min. : 1.00 Min. : 0.0000 Min. : 0.000
1st Qu.: 16.0 1st Qu.: 14.00 1st Qu.: 0.0000 1st Qu.: 0.000
Median : 24.0 Median : 21.00 Median : 0.0000 Median : 1.000
Mean : 26.1 Mean : 23.34 Mean : 0.4119 Mean : 2.343
3rd Qu.: 34.0 3rd Qu.: 30.25 3rd Qu.: 0.0000 3rd Qu.: 3.000
Max. :115.0 Max. :113.00 Max. :21.0000 Max. :32.000
OldestTradeOpenDate DelinquenciesOver30Days DelinquenciesOver60Days
Min. : 1011957 Min. : 0.000 Min. : 0.000
1st Qu.: 4101996 1st Qu.: 0.000 1st Qu.: 0.000
Median : 7191993 Median : 1.000 Median : 0.000
Mean : 6934230 Mean : 4.332 Mean : 1.908
3rd Qu.:10011990 3rd Qu.: 5.000 3rd Qu.: 2.000
Max. :12312004 Max. :99.000 Max. :73.000
DelinquenciesOver90Days IsHomeowner LoanStatus
Min. : 0.000 Mode :logical 1:1847
1st Qu.: 0.000 FALSE:4264 2:1262
Median : 0.000 TRUE :4316 3: 256
Mean : 4.009 NA's :0 4:5215
3rd Qu.: 3.000
Max. :99.000
try(na.fail(credit))
glmFit <- train(LoanStatus~., credit, method = "glm", family=binomial, preProcess=c("pca"),
trControl = trainControl(method = "cv"))
contrasts<-
中的错误(*tmp*
,值= contr.funs [1 + isOF [nn]]):对比只能应用于2级或更多级的因子logregFit <- train(LoanStatus~., credit, method = "logreg", family=binomial, preProcess=c("pca"),
trControl = trainControl(method = "cv"))
contrasts<-
中的错误(*tmp*
,值= contr.funs [1 + isOF [nn]]):对比只能应用于2级或更多级的因子 最佳答案
查看错误消息和数据集的变量,变量BorrowerMetropolitanArea
仅具有一个级别(实际上,如果所有样本都具有相同的值,则根本没有预测值)。我猜这是使用PCA预处理数据集时在contrasts
函数中引起的问题。
尝试在不使用train
变量的情况下调用数据集上的BorrowerMetropolitanArea
函数。