我有以下数据框:
lineups <- tibble::tribble(
~lineupBefore, ~playerOut, ~playerIn,
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Kendrick Nunn", "Kendrick Nunn", "Goran Dragic",
NA, "Justise Winslow", "Derrick Jones Jr.",
NA, "Meyers Leonard", "Kelly Olynyk",
NA, "Bam Adebayo", "Justise Winslow",
NA, "Tyler Herro", "Duncan Robinson",
NA, "Derrick Jones Jr.", "Bam Adebayo",
NA, "Goran Dragic", "Kendrick Nunn",
NA, "Justise Winslow", "Tyler Herro",
NA, "Kelly Olynyk", "Meyers Leonard",
NA, "Bam Adebayo", "Justise Winslow"
)
然后,我创建一个列:
lineups %>%
mutate(lineupAfter = str_replace(lineupBefore, playerOut, playerIn))
结果是:
tibble::tribble(
~lineupBefore, ~playerOut, ~playerIn, ~lineupAfter,
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Kendrick Nunn", "Kendrick Nunn", "Goran Dragic", "Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic",
NA, "Justise Winslow", "Derrick Jones Jr.", NA,
NA, "Meyers Leonard", "Kelly Olynyk", NA,
NA, "Bam Adebayo", "Justise Winslow", NA,
NA, "Tyler Herro", "Duncan Robinson", NA,
NA, "Derrick Jones Jr.", "Bam Adebayo", NA,
NA, "Goran Dragic", "Kendrick Nunn", NA,
NA, "Justise Winslow", "Tyler Herro", NA,
NA, "Kelly Olynyk", "Meyers Leonard", NA,
NA, "Bam Adebayo", "Justise Winslow", NA
)
现在,我想将lineupBefore中的NA值设置为lineupAfter中的先前值。然后,必须将与创建lineupAfter列相同的函数应用于lineupBefore中的新值。如果我尝试使用mutate进行操作,它将仅替换第一个NA行中的值。因此,我需要该函数在每一行上工作,然后将其转换为不同于NA的内容,然后再继续下一行。我想我需要使用purrr来做到这一点,但我不知道该怎么做。任何帮助,将不胜感激!
编辑:
这是预期的前5行:
tibble::tribble(
~lineupBefore, ~playerOut, ~playerIn, ~lineupAfter,
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Kendrick Nunn", "Kendrick Nunn", "Goran Dragic", "Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic",
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic", "Justise Winslow", "Derrick Jones Jr.", "Derrick Jones Jr., Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic",
"Derrick Jones Jr., Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic", "Meyers Leonard", "Kelly Olynyk", "Derrick Jones Jr., Bam Adebayo, Kelly Olynyk, Tyler Herro, Goran Dragic",
"Derrick Jones Jr., Bam Adebayo, Kelly Olynyk, Tyler Herro, Goran Dragic", "Bam Adebayo", "Justise Winslow", "Derrick Jones Jr., Justise Winslow, Kelly Olynyk, Tyler Herro, Goran Dragic",
"Derrick Jones Jr., Justise Winslow, Kelly Olynyk, Tyler Herro, Goran Dragic", "Tyler Herro", "Duncan Robinson", "Derrick Jones Jr., Justise Winslow, Kelly Olynyk, Duncan Robinson, Goran Dragic"
)
如您所见,列lineupBefore的第2行将等于列lineupAfter的第1行,列lineupBefore的第3行将等于列lineupAfter的第2行,依此类推。
同时,lineupAfter的第2行将是str_replace(lineupBefore,playerOut,playerIn)应用于结果lineupBefore的第2行的结果,依此类推。
最佳答案
您要求使用管道式{purrr}
样式方法。您在这里所做的是累积从一组到另一组的更改,因此您想使用purrr::accumulate
和setdiff
。
我认为将lineup*
列设置为列表列要容易得多,而不是像这样的字符串。这意味着在列的每一行中存储名称向量,而不是在其中包含逗号的单个字符串。
从您的第一个lineups
表开始:
library(stringr)
library(dplyr)
library(purrr)
lineups <-
tibble::tribble(
~lineupBefore, ~playerOut, ~playerIn,
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Kendrick Nunn", "Kendrick Nunn", "Goran Dragic",
NA, "Justise Winslow", "Derrick Jones Jr.",
NA, "Meyers Leonard", "Kelly Olynyk",
NA, "Bam Adebayo", "Justise Winslow",
NA, "Tyler Herro", "Duncan Robinson",
NA, "Derrick Jones Jr.", "Bam Adebayo",
NA, "Goran Dragic", "Kendrick Nunn",
NA, "Justise Winslow", "Tyler Herro",
NA, "Kelly Olynyk", "Meyers Leonard",
NA, "Bam Adebayo", "Justise Winslow"
)
lineups_list <-
lineups %>%
mutate(lineupBefore = str_split(lineupBefore, ", "))
lineups_list
# A tibble: 10 x 3
lineupBefore playerOut playerIn
<list> <chr> <chr>
1 <chr [5]> Kendrick Nunn Goran Dragic
2 <chr [1]> Justise Winslow Derrick Jones Jr.
3 <chr [1]> Meyers Leonard Kelly Olynyk
4 <chr [1]> Bam Adebayo Justise Winslow
5 <chr [1]> Tyler Herro Duncan Robinson
6 <chr [1]> Derrick Jones Jr. Bam Adebayo
7 <chr [1]> Goran Dragic Kendrick Nunn
8 <chr [1]> Justise Winslow Tyler Herro
9 <chr [1]> Kelly Olynyk Meyers Leonard
10 <chr [1]> Bam Adebayo Justise Winslow
因此,现在您有了一个
lineupBefore
列,其中第一个元素是一个长度为5的向量,所有长度为1的向量行都是单个NA
值。我们要执行的功能是获取第一个length-5向量,然后将
playerIn
名称依次添加到向量中(c(initial_players, new_player)
一遍又一遍)。如果我们有无限的篮球比赛,那就是我们要拥有的,只是不断增加球员。 purrr::accumulate
会这样做,并在每一步返回结果。但是,我们还想在每一步中从
playerOut
中删除播放器。一遍又一遍地与setdiff(current_players, removed_player)
相同。为了同时执行两个操作,我们使用purrr::accumulate2
。我们传递给它的函数按顺序在args
..1
,..2
和..3
上运行,上一步的结果成为下一步的..1
。我们在playerIn
中传入的第一个参数,因此是每次都将..2
添加到结果中。第二个参数是playerOut
,这就是我们每次都用..3
删除的setdiff
。而且我们必须使用起始名册(lineupBefore[[1]]
)对其进行初始化,否则它将只是从没有球员的空球队中积累的。您可以看到类似以下内容的输出:
x <- lineups_list$playerIn
y <- lineups_list$playerOut
accumulate2(
x, y, ~setdiff(c(..1, ..2), ..3),
.init = lineups_list$lineupBefore[[1]]
)
[[1]]
[1] "Justise Winslow" "Bam Adebayo" "Meyers Leonard" "Tyler Herro" "Kendrick Nunn"
[[2]]
[1] "Justise Winslow" "Bam Adebayo" "Meyers Leonard" "Tyler Herro" "Goran Dragic"
[[3]]
[1] "Bam Adebayo" "Meyers Leonard" "Tyler Herro" "Goran Dragic" "Derrick Jones Jr."
[[4]]
[1] "Bam Adebayo" "Tyler Herro" "Goran Dragic" "Derrick Jones Jr." "Kelly Olynyk"
[[5]]
[1] "Tyler Herro" "Goran Dragic" "Derrick Jones Jr." "Kelly Olynyk" "Justise Winslow"
[[6]]
[1] "Goran Dragic" "Derrick Jones Jr." "Kelly Olynyk" "Justise Winslow" "Duncan Robinson"
[[7]]
[1] "Goran Dragic" "Kelly Olynyk" "Justise Winslow" "Duncan Robinson" "Bam Adebayo"
[[8]]
[1] "Kelly Olynyk" "Justise Winslow" "Duncan Robinson" "Bam Adebayo" "Kendrick Nunn"
[[9]]
[1] "Kelly Olynyk" "Duncan Robinson" "Bam Adebayo" "Kendrick Nunn" "Tyler Herro"
[[10]]
[1] "Duncan Robinson" "Bam Adebayo" "Kendrick Nunn" "Tyler Herro" "Meyers Leonard"
[[11]]
[1] "Duncan Robinson" "Kendrick Nunn" "Tyler Herro" "Meyers Leonard" "Justise Winslow"
但是,这是一个长度为11的列表。这是因为我们从
.init
参数开始,因此它被视为步骤之一。然后,您可能会注意到元素2-11是您想要的lineupAfter
元素,元素1-10是您想要的lineupBefore
元素。因此,您可以使用相同的函数来计算两者,只需要切断第一个元素或最后一个元素即可。 (请注意,您可以使用lead
/ lag
的某些版本将一列与另一列抵消,这将使您无法两次计算这些函数。但是我还是以这种方式来显示它们的并行结构。)lineups_list_filled <- lineups_list %>%
mutate(
lineupAfter = accumulate2(
playerIn, playerOut, ~setdiff(c(..1, ..2), ..3),
.init = lineupBefore[[1]]
)[-1], # [] removes the head
lineupBefore = accumulate2(
playerIn, playerOut, ~setdiff(c(..1, ..2), ..3),
.init = lineupBefore[[1]]
)[-length(playerIn)] # [] removes the last element
)
lineups_list_filled
# A tibble: 10 x 4
lineupBefore playerOut playerIn lineupAfter
<list> <chr> <chr> <list>
1 <chr [5]> Kendrick Nunn Goran Dragic <chr [5]>
2 <chr [5]> Justise Winslow Derrick Jones Jr. <chr [5]>
3 <chr [5]> Meyers Leonard Kelly Olynyk <chr [5]>
4 <chr [5]> Bam Adebayo Justise Winslow <chr [5]>
5 <chr [5]> Tyler Herro Duncan Robinson <chr [5]>
6 <chr [5]> Derrick Jones Jr. Bam Adebayo <chr [5]>
7 <chr [5]> Goran Dragic Kendrick Nunn <chr [5]>
8 <chr [5]> Justise Winslow Tyler Herro <chr [5]>
9 <chr [5]> Kelly Olynyk Meyers Leonard <chr [5]>
10 <chr [5]> Bam Adebayo Justise Winslow <chr [5]>
如果查看
lineups_list_filled$lineupBefore
和lineups_list_filled$lineupAfter
,您会发现它们与上面的length-11列表中的正确元素匹配。例如,如果要将它们折叠回字符串,以便进行打印,则可以始终执行以下操作:lineups_list_filled %>%
mutate_all(
~map_chr(., ~paste(.x, collapse = ", "))
)
附言仅当您具有不可重复的元素(例如名册中的单个播放器)时,此方法才有效。例如,如果您使用任意整数来执行此操作,则不能在其中两次输入3,因为
setdiff
首先调用unique
。在这种情况下,您可以使用setdiff
和match
构建自己的which
版本,并对边缘情况进行一些错误检查。