问题描述
我正在使用glmnet软件包执行LASSO回归.我现在正在使用插入符号包来研究功能的重要性.我不理解的是重要性的价值.谁能启发我?是否有任何公式可以计算这些值,或者这意味着这些值是基于beta值的?
I am using the glmnet package to perform a LASSO regression. I am now working on feature importance using the caret package. What I don't understand is the value of the importance. Could anyone enlighten me? Is there any formula to calculate these values or does that mean that these values are based on the beta values?
ROC curve variable importance
only 7 most important variables shown (out of 25)
Importance
feature1 0.8974
feature2 0.8962
feature3 0.8957
feature4 0.8744
feature5 0.8701
feature6 0.8658
feature7 0.8253
推荐答案
caret
实际上查看拟合的最终系数,然后采用绝对值对系数进行排名.然后将排名的系数存储为变量重要性.
caret
actually looks at the final coefficients of the fit and then takes the absolute value to rank the coefficients. Then the ranked coefficients are stored as variable importance.
要查看源代码,可以键入
To view the source code, you can type
getModelInfo("glmnet")$glmnet$varImp
总结一下,这些是要计算的行:
To summarize, these are the lines to calculate it:
function(object, lambda = NULL, ...) {
## skipping a few lines
beta <- predict(object, s = lambda, type = "coef")
if(is.list(beta)) {
out <- do.call("cbind", lapply(beta, function(x) x[,1]))
out <- as.data.frame(out)
} else out <- data.frame(Overall = beta[,1])
out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
out
}
这篇关于脱字符号包-glmnet变量重要性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!