在尝试获取分组的滞后变量的过程中(仅使用lag是不可能的),建议的解决方案是拉出数据,滞后不同的行,然后重新加入。

我宁愿不创建中间对象就执行此操作,并且希望在链中进行操作。但是,它似乎不像我期望的那样起作用,问题似乎是在使用.和left_join内的嵌套链之间存在某种交互作用。

require(tidyverse)
#> Loading required package: tidyverse
df <- data.frame(Team = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D"),
                 Date = c("2016-05-10","2016-05-10", "2016-05-10", "2016-05-10",
                          "2016-05-12", "2016-05-12", "2016-05-12",
                          "2016-05-15","2016-05-15",
                          "2016-05-30", "2016-05-30"),
                 Points = c(1,4,3,2,1,5,6,1,2,3,9)
)


#This works:
df %>% left_join(x = ., y = df %>%
                   distinct(Team, Date) %>%
                   mutate(Date_Lagged = lag(Date)))
#> Joining, by = c("Team", "Date")
#>    Team       Date Points Date_Lagged
#> 1     A 2016-05-10      1        <NA>
#> 2     A 2016-05-10      4        <NA>
#> 3     A 2016-05-10      3        <NA>
#> 4     A 2016-05-10      2        <NA>
#> 5     B 2016-05-12      1  2016-05-10
#> 6     B 2016-05-12      5  2016-05-10
#> 7     B 2016-05-12      6  2016-05-10
#> 8     C 2016-05-15      1  2016-05-12
#> 9     C 2016-05-15      2  2016-05-12
#> 10    D 2016-05-30      3  2016-05-15
#> 11    D 2016-05-30      9  2016-05-15

#And this works:
df %>% left_join(x = ., y = .)
#> Joining, by = c("Team", "Date", "Points")
#>    Team       Date Points
#> 1     A 2016-05-10      1
#> 2     A 2016-05-10      4
#> 3     A 2016-05-10      3
#> 4     A 2016-05-10      2
#> 5     B 2016-05-12      1
#> 6     B 2016-05-12      5
#> 7     B 2016-05-12      6
#> 8     C 2016-05-15      1
#> 9     C 2016-05-15      2
#> 10    D 2016-05-30      3
#> 11    D 2016-05-30      9

#This doesn't work despite the fact that `.` is df.
df %>% left_join(x = ., y = . %>%
                   distinct(Team, Date) %>%
                   mutate(Date_Lagged = lag(Date)))
#> Error in UseMethod("tbl_vars"): no applicable method for 'tbl_vars' applied to an object of class "c('fseq', 'function')"



#Desired output
distinct(df, Team, Date) %>%
  mutate(Date_Lagged = lag(Date)) %>%
  right_join(., df) %>%
  select(Team, Date, Points, Date_Lagged)
#> Joining, by = c("Team", "Date")
#>    Team       Date Points Date_Lagged
#> 1     A 2016-05-10      1        <NA>
#> 2     A 2016-05-10      4        <NA>
#> 3     A 2016-05-10      3        <NA>
#> 4     A 2016-05-10      2        <NA>
#> 5     B 2016-05-12      1  2016-05-10
#> 6     B 2016-05-12      5  2016-05-10
#> 7     B 2016-05-12      6  2016-05-10
#> 8     C 2016-05-15      1  2016-05-12
#> 9     C 2016-05-15      2  2016-05-12
#> 10    D 2016-05-30      3  2016-05-15
#> 11    D 2016-05-30      9  2016-05-15

reprex package(v0.2.0)创建于2018-06-12。

最佳答案

为了使代码正常工作,您需要在y参数周围加上花括号,如下所示

  df %>% left_join(x = ., y = {.} %>%
                   distinct(Team, Date) %>%
                   mutate(Date_Lagged = lag(Date)))

Joining, by = c("Team", "Date")
   Team       Date Points Date_Lagged
1     A 2016-05-10      1        <NA>
2     A 2016-05-10      4        <NA>
3     A 2016-05-10      3        <NA>
4     A 2016-05-10      2        <NA>
5     B 2016-05-12      1  2016-05-10
6     B 2016-05-12      5  2016-05-10
7     B 2016-05-12      6  2016-05-10
8     C 2016-05-15      1  2016-05-12
9     C 2016-05-15      2  2016-05-12
10    D 2016-05-30      3  2016-05-15
11    D 2016-05-30      9  2016-05-15

哦,你可以做
df %>% left_join(df%>%
                   distinct(Team, Date) %>%
                   mutate(Date_Lagged = lag(Date)))

关于r - dplyr/left_join中的嵌套管链,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50827052/

10-12 17:12