本文介绍了线性判别分析变量重要性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用R MASS软件包进行线性判别分析,有没有办法得到变量重要性的量度?

Using the R MASS package to do a linear discriminant analysis, is there a way to get a measure of variable importance?

Library(MASS)
### import data and do some preprocessing
fit <- lda(cat~., data=train)

我有大约20个测量值的数据集,以预测一个二进制类别.但是很难获得测量值,因此我想将测量值的数量减少到最有影响力.

I have is a data set with about 20 measurements to predict a binary category. But the measurements are hard to obtain so I want to reduce the number of measurements to the most influential.

当使用rpart或randomForests时,我可以使用summary()或重要性()获得变量重要性列表,或gimi减少统计信息.

When using rpart or randomForests I can get a list of variable importance, or a gimi decrease stat using summary() or importance().

是否有内置函数可以执行我找不到的功能?或者,如果我必须编写一个代码,什么是解决该问题的好方法?

Is there a built in function to do this that I cant find?Or if I have to code one, what would be a good way to go about it?

推荐答案

我建议使用插入符"程序包.

I would recommend to use the "caret" package.

library(caret)
data(mdrr)
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .8)]
set.seed(1)
inTrain <- createDataPartition(mdrrClass, p = .75, list = FALSE)[,1]
train <- mdrrDescr[ inTrain, ]
test  <- mdrrDescr[-inTrain, ]
trainClass <- mdrrClass[ inTrain]
testClass  <- mdrrClass[-inTrain]

set.seed(2)
ldaProfile <- rfe(train, trainClass,
                  sizes = c(1:10, 15, 30),
                  rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))


postResample(predict(ldaProfile, test), testClass)

创建变量"ldaProfile"后,您可以检索变量的最佳子集及其说明:

Once the variable "ldaProfile" is created you can retrieve the best subset of variables and its description:

ldaProfile$optVariables
[1] "X5v"    "VRA1"   "D.Dr06" "Wap"    "G1"     "Jhetm"  "QXXm"
[8] "nAB"    "H3D"    "nR06"   "TI2"    "nBnz"   "Xt"     "VEA1"
[15] "TIE"

此外,您还可以获得有关已使用变量与准确度的关系图.

Also you can get a nice plot of used variables vs. Accuracy.

这篇关于线性判别分析变量重要性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 10:39