问题描述
我想在 data.frame
中添加最小和最大日期之间的所有缺失日期,并对所有缺失值进行线性插值,例如
I would like to add all missing dates between min and max date in a data.frame
and linear interpolate all missing values, like
df <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
"2015-10-12","2015-10-14")),
value = c(8,3,9,NA,5))
date value
2015-10-05 8
2015-10-08 3
2015-10-09 9
2015-10-12 NA
2015-10-14 5
date value approx
2015-10-05 8 8
2015-10-06 NA 6.33
2015-10-07 NA 4.67
2015-10-08 3 3
2015-10-09 9 9
2015-10-10 NA 8.20
2015-10-11 NA 7.40
2015-10-12 NA 6.60
2015-10-13 NA 5.80
2015-10-14 5 5
dplyr
和 approx
是否有明确的解决方案?(我不喜欢我的 10 行 for
循环代码.)
Is there a clear solution with dplyr
and approx
?(I do not like my 10 line for
loop code.)
推荐答案
这是一种方法.我使用第一个和最后一个日期创建了一个带有日期序列的数据框.使用 dplyr
包中的 full_join()
,我合并了数据框和 mydf
.然后我使用 zoo 包中的 na.approx()
来处理 mutate()
部分中的插值.
Here is one way. I created a data frame with a sequence of date using the first and last date. Using full_join()
in the dplyr
package, I merged the data frame and mydf
. I then used na.approx()
in the zoo package to handle the interpolation in the mutate()
part.
mydf <- data.frame(date = as.Date(c("2015-10-05","2015-10-08","2015-10-09",
"2015-10-12","2015-10-14")),
value = c(8,3,9,NA,5))
library(dplyr)
library(zoo)
data.frame(date = seq(mydf$date[1], mydf$date[nrow(mydf)], by = 1)) %>%
full_join(mydf, by = "date") %>%
mutate(approx = na.approx(value))
# date value approx
#1 2015-10-05 8 8.000000
#2 2015-10-06 NA 6.333333
#3 2015-10-07 NA 4.666667
#4 2015-10-08 3 3.000000
#5 2015-10-09 9 9.000000
#6 2015-10-10 NA 8.200000
#7 2015-10-11 NA 7.400000
#8 2015-10-12 NA 6.600000
#9 2015-10-13 NA 5.800000
#10 2015-10-14 5 5.000000
这篇关于线性插值时间序列中的缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!