问题描述
我使用 robustbase
库在 R 中使用 lmrob
函数进行稳健回归.我会用它作为,rob_reg<-lmrob(y~0+.,dat,method="MM",control=a1)
.当我想返回摘要时,我使用 summary(rob_reg)
并且稳健回归所做的一件事是识别数据中的异常值.摘要输出的某一部分给了我以下内容,
I am using the lmrob
function in R using the robustbase
library for robust regression. I would use it as, rob_reg<-lmrob(y~0+.,dat,method="MM",control=a1)
. When i want to return the summary i use summary(rob_reg)
and one thing robust regression do is identifying outliers in the data. A certain part of the summary output give me the following,
6508 个观测值 c(49,55,58,77,104,105,106,107,128,134,147,153,...)是具有 |权重| 的异常值<= 1.4e-06 ( < 1.6e-06);
其中列出了所有异常值,在本例中为 6508(我删除了大多数并将其替换为 ...).我需要以某种方式获取这些异常值并将它们从我的数据中删除.我之前所做的是使用 summary(rob_reg)$rweights
来获取观察的所有权重,并删除那些权重小于上面示例中某个值的观察值,该值将是 1.6e-06.我想知道,有没有办法在不先获得所有观察值的权重的情况下只获得异常值的列表?
which list all the outliers, in this case 6508 (i removed the majority and replaced it by ...). I need to somehow get these these outliers and remove them from my data. What i did before was to use summary(rob_reg)$rweights
to get all the weights for the observations and remove those observations with a weight less than say a certain value in the example above the value would be 1.6e-06
. I would like to know, is there a way to get a list of only the outliers without first getting the weights of all the observations?
推荐答案
这是一个旧帖子,但我最近需要这个,所以我想我会分享我的解决方案.
This is an old post but I recently had a need for this so I thought I'd share my solution.
#fit the model
fit = lmrob(y ~ x, data)
#create a model summary
fit.summary = summary(fit)
#extract the outlier threshold weight from the summary
out.thresh = fit.summary$control$eps.outlier
#returns the weights corresponding to the outliers
#names(out.liers) corresponds to the index of the observation
out.liers = fit.summary$rweights[which(fit.summary$rweights <= out.thresh)]
#add a True/False variable for outlier to the original data by matching row.names of the original data to names of the list of outliers
data$outlier = rep(NA, nrow(data))
for(i in 1:nrow(data)){
data$outlier[i] = ifelse(row.names(data[i] %in% names(out.liers), "True", "False")
}
这篇关于R 中具有稳健回归的异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!