假设我有以下data.table

library(data.table)
set.seed(123)
df <- as.data.table(data.frame(date = c("2017-01-01", "2017-01-05", "2017-01-08", "2017-01-01", "2017-01-05", "2017-01-08"),
                 value = rnorm(6),
                 mygroup = rep(LETTERS[1:2], each = 3)))

我想按组填写“最后”值中的缺失日期。我找到的最接近的是this question,它显示了如何不进行分组。
all_dates <- seq(from = as.Date("2017-01-01"),
                   to = as.Date("2017-01-08"),
                   by = "days")

df[J(all_dates), roll=Inf]

但是,我需要按组进行操作,并且使用by会导致错误

[.data.table(df,J(all_dates),roll = Inf,by = mygroup)中的错误:
提供“by”或“keyby”,但不提供

最佳答案

我们可以将mygroup添加为滚动连接的另一列:

df[, date := as.Date(date)]

df[
  df[, .(date = seq(first(date), last(date), by="day")), by=mygroup],
  on=.(mygroup, date),
  roll=TRUE]

          date       value mygroup
 1: 2017-01-01 -0.56047565       A
 2: 2017-01-02 -0.56047565       A
 3: 2017-01-03 -0.56047565       A
 4: 2017-01-04 -0.56047565       A
 5: 2017-01-05 -0.23017749       A
 6: 2017-01-06 -0.23017749       A
 7: 2017-01-07 -0.23017749       A
 8: 2017-01-08  1.55870831       A
 9: 2017-01-01  0.07050839       B
10: 2017-01-02  0.07050839       B
11: 2017-01-03  0.07050839       B
12: 2017-01-04  0.07050839       B
13: 2017-01-05  0.12928774       B
14: 2017-01-06  0.12928774       B
15: 2017-01-07  0.12928774       B
16: 2017-01-08  1.71506499       B

“滚动”总是发生在on=的最后一列。

如果表格中有更多列,而我们只想填充其中的一些列...
# extend example
set.seed(1)
df[, y := rpois(.N, 1)]

# build new table
newDT = df[, .(date = seq(first(date), last(date), by="day")), by=mygroup]

roll_cols = "value"
newDT[, (roll_cols) :=
  df[newDT, on=.(mygroup, date), roll=TRUE, mget(paste0("x.", roll_cols))]]

noroll_cols = "y"
newDT[df, on=.(mygroup, date), (noroll_cols) := mget(paste0("i.", noroll_cols)) ]

    mygroup       date       value  y
 1:       A 2017-01-01 -0.56047565  0
 2:       A 2017-01-02 -0.56047565 NA
 3:       A 2017-01-03 -0.56047565 NA
 4:       A 2017-01-04 -0.56047565 NA
 5:       A 2017-01-05 -0.23017749  1
 6:       A 2017-01-06 -0.23017749 NA
 7:       A 2017-01-07 -0.23017749 NA
 8:       A 2017-01-08  1.55870831  1
 9:       B 2017-01-01  0.07050839  2
10:       B 2017-01-02  0.07050839 NA
11:       B 2017-01-03  0.07050839 NA
12:       B 2017-01-04  0.07050839 NA
13:       B 2017-01-05  0.12928774  0
14:       B 2017-01-06  0.12928774 NA
15:       B 2017-01-07  0.12928774 NA
16:       B 2017-01-08  1.71506499  2

关于r - 按组用以前的值填写缺少的日期,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45599810/

10-10 00:17