假设我有这个数据框:

  times vals
1     1    2
2     3    4
3     7    6


设置

foo <- data.frame(times=c(1,3,7), vals=c(2,4,6))


我想要这个:

  times vals
1     1    2
2     2    2
3     3    4
4     4    4
5     5    4
6     6    4
7     7    6


也就是说,我要填写从1到7的所有时间,并从不大于给定时间的最近时间开始填写值。

我有一些使用dplyr进行编码的代码,但这很丑陋。建议更好?

library(dplyr)

foo <- merge(foo, data.frame(times=1:max(foo$times)), all.y=TRUE)
foo2 <- merge(foo, foo, by=c(), suffixes=c('', '.1'))

foo2 <- foo2 %>% filter(is.na(vals) & !is.na(vals.1) & times.1 <= times) %>%
  group_by(times) %>% arrange(-times.1) %>% mutate(rn = row_number()) %>%
  filter(rn == 1) %>%
  mutate(vals = vals.1,
         rn = NULL,
         vals.1 = NULL,
         times.1 = NULL)

foo <- merge(foo, foo2, by=c('times'), all.x=TRUE, suffixes=c('', '.2'))
foo <- mutate(foo,
              vals = ifelse(is.na(vals), vals.2, vals),
              vals.2 = NULL)

最佳答案

dplyrtidyr选项:

library(dplyr)
library(tidyr)

foo %>%
 right_join(data_frame(times = min(foo$times):max(foo$times))) %>%
 fill(vals)
# Joining by: "times"
#   times vals
# 1     1    2
# 2     2    2
# 3     3    4
# 4     4    4
# 5     5    4
# 6     6    4
# 7     7    6

关于r - 在R?的数据框中填写值?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37168929/

10-12 18:00