中具有稳健回归的异常值

中具有稳健回归的异常值

本文介绍了R 中具有稳健回归的异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 robustbase 库在 R 中使用 lmrob 函数进行稳健回归.我会用它作为,rob_reg<-lmrob(y~0+.,dat,method="MM",control=a1).当我想返回摘要时,我使用 summary(rob_reg) 并且稳健回归所做的一件事是识别数据中的异常值.摘要输出的某一部分给了我以下内容,

I am using the lmrob function in R using the robustbase library for robust regression. I would use it as, rob_reg<-lmrob(y~0+.,dat,method="MM",control=a1). When i want to return the summary i use summary(rob_reg) and one thing robust regression do is identifying outliers in the data. A certain part of the summary output give me the following,

6508 个观测值 c(49,55,58,77,104,105,106,107,128,134,147,153,...)是具有 |权重| 的异常值<= 1.4e-06 ( < 1.6e-06);

其中列出了所有异常值,在本例中为 6508(我删除了大多数并将其替换为 ...).我需要以某种方式获取这些异常值并将它们从我的数据中删除.我之前所做的是使用 summary(rob_reg)$rweights 来获取观察的所有权重,并删除那些权重小于上面示例中某个值的观察值,该值将是 1.6e-06.我想知道,有没有办法在不先获得所有观察值的权重的情况下只获得异常值的列表?

which list all the outliers, in this case 6508 (i removed the majority and replaced it by ...). I need to somehow get these these outliers and remove them from my data. What i did before was to use summary(rob_reg)$rweights to get all the weights for the observations and remove those observations with a weight less than say a certain value in the example above the value would be 1.6e-06. I would like to know, is there a way to get a list of only the outliers without first getting the weights of all the observations?

推荐答案

这是一个旧帖子,但我最近需要这个,所以我想我会分享我的解决方案.

This is an old post but I recently had a need for this so I thought I'd share my solution.

    #fit the model
    fit = lmrob(y ~ x, data)
    #create a model summary
    fit.summary = summary(fit)

    #extract the outlier threshold weight from the summary
    out.thresh = fit.summary$control$eps.outlier

    #returns the weights corresponding to the outliers
    #names(out.liers) corresponds to the index of the observation
    out.liers = fit.summary$rweights[which(fit.summary$rweights <= out.thresh)]

    #add a True/False variable for outlier to the original data by matching row.names of the original data to names of the list of outliers
    data$outlier = rep(NA, nrow(data))
    for(i in 1:nrow(data)){
      data$outlier[i] = ifelse(row.names(data[i] %in% names(out.liers), "True", "False")
    }

这篇关于R 中具有稳健回归的异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 20:19