问题描述
有没有一种方法可以在 zoo
或 xts
对象中填充 NA
s,但 NA
向前.换句话说,就像填充 NA
s 最多 3 个连续的 NA
s,然后从第 4 个值开始保持 NA
s 直到一个有效数字.
Is there a way we can fill NA
s in a zoo
or xts
object with limited number of NA
s forward. In other words like fill NA
s up to 3 consecutive NA
s, and then keep the NA
s from the 4th value on until a valid number.
类似的东西.
library(zoo)
x <- zoo(1:20, Sys.Date() + 1:20)
x[c(2:4, 6:10, 13:18)] <- NA
x
2014-09-20 2014-09-21 2014-09-22 2014-09-23 2014-09-24 2014-09-25 2014-09-26
1 NA NA NA 5 NA NA
2014-09-27 2014-09-28 2014-09-29 2014-09-30 2014-10-01 2014-10-02 2014-10-03
NA NA NA 11 12 NA NA
2014-10-04 2014-10-05 2014-10-06 2014-10-07 2014-10-08 2014-10-09
NA NA NA NA 19 20
所需的输出,将是变量 n = 3 的东西
Desired output, will be something with variable n = 3 is
2014-09-20 2014-09-21 2014-09-22 2014-09-23 2014-09-24 2014-09-25 2014-09-26
1 1 1 1 5 5 5
2014-09-27 2014-09-28 2014-09-29 2014-09-30 2014-10-01 2014-10-02 2014-10-03
5 NA NA 11 12 12 12
2014-10-04 2014-10-05 2014-10-06 2014-10-07 2014-10-08 2014-10-09
12 NA NA NA 19 20
我尝试了很多与 na.locf(x, maxgap = 3)
等的组合,但没有取得多大成功.我可以创建一个循环来获得所需的输出,我想知道是否有矢量化的方式来实现这一点.
I have tried lot of combination with na.locf(x, maxgap = 3)
etc without much success. I can create a loop to get the desired output, I was wondering whether there is vectorized way of achieving this.
fillInTheBlanks <- function(v, n=3) {
result <- v
counter0 <- 1
for(i in 2:length(v)) {
value <- v[i]
if (is.na(value)) {
if (counter0 > n) {
result[i] <- v[i]
} else {
result[i] <- result[i-1]
counter0 <- counter0 + 1
} }
else {
result[i] <- v[i]
counter0 <- 1
}
}
return(result)
}
谢谢
推荐答案
这是另一种方式:
l <- cumsum(! is.na(x))
c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
# [1] 1 1 1 1 5 5 5 5 NA NA 11 12 12 12 12 NA NA NA 19 20
edit:我之前的回答要求 x
没有重复项.目前的答案没有.
edit: my previous answer required that x
have no duplicates. The current answer does not.
基准测试
x <- rep(x, length.out=1e4)
plourde <- function(x) {
l <- cumsum(! is.na(x))
c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
}
agstudy <- function(x) {
unlist(sapply(split(coredata(x),cumsum(!is.na(x))),
function(sx){
if(length(sx)>3)
sx[2:4] <- rep(sx[1],3)
else sx <- rep(sx[1],length(sx))
sx
}))
}
microbenchmark(plourde(x), agstudy(x))
# Unit: milliseconds
# expr min lq median uq max neval
# plourde(x) 5.30 5.591 6.409 6.774 57.13 100
# agstudy(x) 16.04 16.249 16.454 17.516 20.64 100
这篇关于仅将时间序列中的 NA 填充到有限的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!