问题描述
我有一个数据框,其中每一行是不同的日期,每一列是不同的时间序列.
表格中的日期范围是01.01.2019-01.01.2021.
一些时间序列仅与部分日期相关,并且在周末和节假日缺少值.
I have a data frame where each row is a different date and every column is different time series.
The date range in the table is 01.01.2019-01.01.2021.
Some of the time series are relevant for only part of the dates and have missing values on weekends and holidays.
如何仅使用每一列的相关日期的前一天值来完成每个时间序列的缺失值(如果特定列中的时间序列是从01.03.2019到01.09.2019,我只想完成在此日期范围内缺少值)?
How can I complete the missing values for each time series using previous day values only for the relevant dates of each column (if the time series in a specific column is from 01.03.2019 to 01.09.2019 I want to complete only the missing values in this dates range)?
我尝试使用填充功能:
data <- data %>%
fill(colnames(data))
但是在特定的时间序列结束后,它还会完成丢失的数据.
but it completes also the missing data after the specific time series is over.
例如,df是:
# Date time_series_1 time_series_2
1 01-01-2019 NA 10
2 02-01-2019 5 NA
3 03-01-2019 10 NA
4 04-01-2019 20 6
5 05-01-2019 30 NA
6 06-01-2019 NA 8
7 07-01-2019 7 NA
8 08-01-2019 5 NA
9 09-01-2019 NA NA
10 10-01-2019 NA NA
所需的输出是:
# Date time_series_1 time_series_2
1 01-01-2019 NA 10
2 02-01-2019 5 10
3 03-01-2019 10 10
4 04-01-2019 20 6
5 05-01-2019 30 6
6 06-01-2019 30 8
7 07-01-2019 7 NA
8 08-01-2019 5 NA
9 09-01-2019 NA NA
10 10-01-2019 NA NA
谢谢!
推荐答案
如果我正确理解,窍门是除了最底端的NA之外,您要向下填充.而 tidyr
的 fill
的问题在于,它会一直下降.
If I understand correctly, the trick is that you want to fill downward except for the bottommost NAs. And the problem with tidyr
's fill
is that it goes all the way down.
这不是一个完整的解决方案,但是对于此数据:
This isn't a fully-tidyverse solution, but for this data:
library(dplyr)
library(tidyr)
data <- tribble(
~Date, ~time_series_1, ~time_series_2,
as.Date("2019-01-01"), NA, 10,
as.Date("2019-02-01"), 5, NA,
as.Date("2019-03-01"), 10, NA,
as.Date("2019-04-01"), 20, 6,
as.Date("2019-05-01"), 30, NA,
as.Date("2019-06-01"), NA, 8,
as.Date("2019-07-01"), 7, NA,
as.Date("2019-08-01"), 5, NA,
as.Date("2019-09-01"), NA, NA,
as.Date("2019-10-01"), NA, NA
)
您可以分别确定每个时间序列的结束日期:
You can determine the ending date for each time series separately:
LastTS1Date <- with( data, max(Date[!is.na(time_series_1)]))
LastTS2Date <- with( data, max(Date[!is.na(time_series_2)]))
然后使用baseR过滤器语法仅更改截止日期的数据框部分:
And then use baseR filter syntax to only change the part of the data frame that goes up to those dates:
data[data$Date <= LastTS1Date,] <-
data[data$Date <= LastTS1Date,] %>% fill(time_series_1)
data[data$Date <= LastTS2Date,] <-
data[data$Date <= LastTS2Date,] %>% fill(time_series_2)
这篇关于使用前一天的数据完成时间序列中的缺失值-使用R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!