问题描述
我正在使用R中的Boruta包进行变量选择.Boruta在一张图中为我提供了一系列标准的箱线图,这很有用,但是鉴于我的预测变量过多,我希望能够限制出现在boruta图中的方框图的数量.如下图所示.
I'm doing variable selection with the Boruta package in R. Boruta gives me the standard series of boxplots in a single graph, which is useful, but given the fact that I have too many predictors, I am hoping to be able to limit the number of boxplots that appear in the boruta plot. Something like the following image.
基本上,我想在图的右端缩放",但不知道如何使用boruta图对象.
Basicacly, I want to "zoom" on the right end of the plot, but have no idea how to do that with the boruta plot object.
谢谢
MR
推荐答案
听起来像一个简单的问题,解决方案似乎令人费解.也许有人可以想出一种更快/更优雅的方式...
Sounds like an simple question, the solution seems surprisingly convoluted. Perhaps somebody can come up with a quicker/more elegant way...
在这里,我基于源函数 plot.Boruta
创建了一个新函数,并添加了一个函数参数 pars
,该参数采用了我们在d想包含在情节中.
Here, I create a new function based on the source function plot.Boruta
, and add a function argument pars
that takes the names of variables/predictors that we'd like to include in the plot.
例如,我使用 iris
数据集来拟合模型.
As an example, I use the iris
dataset to fit a model.
# Fit model to the iris dataset
library(Boruta);
fit <- Boruta(Species ~ ., data = iris, doTrace = 2);
generateCol
函数在内部由 plot.Boruta
调用,但未导出,因此不在包外部可用.但是,我们需要用于修订后的 plot.Boruta
例程的函数.
The function generateCol
is internally called by plot.Boruta
, but is not exported and therefore not available outside of the package. However, we need the function for our revised plot.Boruta
routine.
# generateCol is needed by plot.Boruta
generateCol<-function(x,colCode,col,numShadow){
#Checking arguments
if(is.null(col) & length(colCode)!=4)
stop('colCode should have 4 elements.');
#Generating col
if(is.null(col)){
rep(colCode[4],length(x$finalDecision)+numShadow)->cc;
cc[c(x$finalDecision=='Confirmed',rep(FALSE,numShadow))]<-colCode[1];
cc[c(x$finalDecision=='Tentative',rep(FALSE,numShadow))]<-colCode[2];
cc[c(x$finalDecision=='Rejected',rep(FALSE,numShadow))]<-colCode[3];
col=cc;
}
return(col);
}
我们现在修改 plot.Boruta
,并添加一个函数参数 pars
,通过该参数我们过滤变量列表.
We now modify plot.Boruta
, and add a function parameter pars
, by which we filter our list of variables.
# Modified plot.Boruta
plot.Boruta.sel <- function(
x,
pars = NULL,
colCode = c('green','yellow','red','blue'),
sort = TRUE,
whichShadow = c(TRUE, TRUE, TRUE),
col = NULL, xlab = 'Attributes', ylab = 'Importance', ...) {
#Checking arguments
if(class(x)!='Boruta')
stop('This function needs Boruta object as an argument.');
if(is.null(x$ImpHistory))
stop('Importance history was not stored during the Boruta run.');
#Removal of -Infs and conversion to a list
lz <- lapply(1:ncol(x$ImpHistory), function(i)
x$ImpHistory[is.finite(x$ImpHistory[,i]),i]);
colnames(x$ImpHistory)->names(lz);
#Selection of shadow meta-attributes
numShadow <- sum(whichShadow);
lz <- lz[c(rep(TRUE,length(x$finalDecision)), whichShadow)];
#Generating color vector
col <- generateCol(x, colCode, col, numShadow);
#Ordering boxes due to attribute median importance
if (sort) {
ii <- order(sapply(lz, stats::median));
lz <- lz[ii];
col <- col[ii];
}
# Select parameters of interest
if (!is.null(pars)) lz <- lz[names(lz) %in% pars];
#Final plotting
graphics::boxplot(lz, xlab = xlab, ylab = ylab, col = col, ...);
invisible(x);
}
现在我们要做的就是调用 plot.Boruta.sel
而不是 plot
,然后指定我们要包含的变量.
Now all we need to do is call plot.Boruta.sel
instead of plot
, and specify the variables that we'd like to include.
plot.Boruta.sel(fit, pars = c("Sepal.Length", "Sepal.Width"));
这篇关于R中的Boruta箱形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!