本文介绍了R中的Boruta箱形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R中的Boruta包进行变量选择.Boruta在一张图中为我提供了一系列标准的箱线图,这很有用,但是鉴于我的预测变量过多,我希望能够限制出现在boruta图中的方框图的数量.如下图所示.

I'm doing variable selection with the Boruta package in R. Boruta gives me the standard series of boxplots in a single graph, which is useful, but given the fact that I have too many predictors, I am hoping to be able to limit the number of boxplots that appear in the boruta plot. Something like the following image.

基本上,我想在图的右端缩放",但不知道如何使用boruta图对象.

Basicacly, I want to "zoom" on the right end of the plot, but have no idea how to do that with the boruta plot object.

谢谢

MR

推荐答案

听起来像一个简单的问题,解决方案似乎令人费解.也许有人可以想出一种更快/更优雅的方式...

Sounds like an simple question, the solution seems surprisingly convoluted. Perhaps somebody can come up with a quicker/more elegant way...

在这里,我基于源函数 plot.Boruta 创建了一个新函数,并添加了一个函数参数 pars ,该参数采用了我们在d想包含在情节中.

Here, I create a new function based on the source function plot.Boruta, and add a function argument pars that takes the names of variables/predictors that we'd like to include in the plot.

例如,我使用 iris 数据集来拟合模型.

As an example, I use the iris dataset to fit a model.

# Fit model to the iris dataset
library(Boruta);
fit <- Boruta(Species ~ ., data = iris, doTrace = 2);

generateCol 函数在内部由 plot.Boruta 调用,但未导出,因此不在包外部可用.但是,我们需要用于修订后的 plot.Boruta 例程的函数.

The function generateCol is internally called by plot.Boruta, but is not exported and therefore not available outside of the package. However, we need the function for our revised plot.Boruta routine.

# generateCol is needed by plot.Boruta
generateCol<-function(x,colCode,col,numShadow){
 #Checking arguments
 if(is.null(col) & length(colCode)!=4)
  stop('colCode should have 4 elements.');
 #Generating col
 if(is.null(col)){
  rep(colCode[4],length(x$finalDecision)+numShadow)->cc;
  cc[c(x$finalDecision=='Confirmed',rep(FALSE,numShadow))]<-colCode[1];
  cc[c(x$finalDecision=='Tentative',rep(FALSE,numShadow))]<-colCode[2];
  cc[c(x$finalDecision=='Rejected',rep(FALSE,numShadow))]<-colCode[3];
  col=cc;
 }
 return(col);
}

我们现在修改 plot.Boruta ,并添加一个函数参数 pars ,通过该参数我们过滤变量列表.

We now modify plot.Boruta, and add a function parameter pars, by which we filter our list of variables.

# Modified plot.Boruta
plot.Boruta.sel <- function(
    x,
    pars = NULL,
    colCode = c('green','yellow','red','blue'),
    sort = TRUE,
    whichShadow = c(TRUE, TRUE, TRUE),
    col = NULL, xlab = 'Attributes', ylab = 'Importance', ...) {

    #Checking arguments
    if(class(x)!='Boruta')
        stop('This function needs Boruta object as an argument.');
    if(is.null(x$ImpHistory))
        stop('Importance history was not stored during the Boruta run.');

    #Removal of -Infs and conversion to a list
    lz <- lapply(1:ncol(x$ImpHistory), function(i)
        x$ImpHistory[is.finite(x$ImpHistory[,i]),i]);
    colnames(x$ImpHistory)->names(lz);

    #Selection of shadow meta-attributes
    numShadow <- sum(whichShadow);
    lz <- lz[c(rep(TRUE,length(x$finalDecision)), whichShadow)];

    #Generating color vector
    col <- generateCol(x, colCode, col, numShadow);

    #Ordering boxes due to attribute median importance
    if (sort) {
        ii <- order(sapply(lz, stats::median));
        lz <- lz[ii];
        col <- col[ii];
    }

    # Select parameters of interest
    if (!is.null(pars)) lz <- lz[names(lz) %in% pars];

    #Final plotting
    graphics::boxplot(lz, xlab = xlab, ylab = ylab, col = col, ...);
    invisible(x);
}

现在我们要做的就是调用 plot.Boruta.sel 而不是 plot ,然后指定我们要包含的变量.

Now all we need to do is call plot.Boruta.sel instead of plot, and specify the variables that we'd like to include.

plot.Boruta.sel(fit, pars = c("Sepal.Length", "Sepal.Width"));

这篇关于R中的Boruta箱形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-09 19:06