本文介绍了脱字符号包-glmnet变量重要性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用glmnet软件包执行LASSO回归.我现在正在使用插入符号包来研究功能的重要性.我不理解的是重要性的价值.谁能启发我?是否有任何公式可以计算这些值,或者这意味着这些值是基于beta值的?

I am using the glmnet package to perform a LASSO regression. I am now working on feature importance using the caret package. What I don't understand is the value of the importance. Could anyone enlighten me? Is there any formula to calculate these values or does that mean that these values are based on the beta values?

ROC curve variable importance
  only 7 most important variables shown (out of 25)
                                            Importance
feature1                             0.8974
feature2                             0.8962
feature3                              0.8957
feature4                              0.8744
feature5                              0.8701
feature6                              0.8658
feature7                             0.8253

推荐答案

caret实际上查看拟合的最终系数,然后采用绝对值对系数进行排名.然后将排名的系数存储为变量重要性.

caret actually looks at the final coefficients of the fit and then takes the absolute value to rank the coefficients. Then the ranked coefficients are stored as variable importance.

要查看源代码,可以键入

To view the source code, you can type

getModelInfo("glmnet")$glmnet$varImp

总结一下,这些是要计算的行:

To summarize, these are the lines to calculate it:

function(object, lambda = NULL, ...) {

  ## skipping a few lines

  beta <- predict(object, s = lambda, type = "coef")
  if(is.list(beta)) {
    out <- do.call("cbind", lapply(beta, function(x) x[,1]))
    out <- as.data.frame(out)
  } else out <- data.frame(Overall = beta[,1])
  out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
  out
}

这篇关于脱字符号包-glmnet变量重要性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 10:39