问题描述
我有以下模拟数据集:
A <- c("Acura", "BMW", "Toyota", NA)
B <- c("1993;2004;2010", "2013", "2003;2011", NA)
C <- c("Blue;Black;Gold", "Silver", NA, NA)
df <- data.frame(A = A, B = B, C = C)
所以数据框看起来像这样:
So the data frame looks like this:
> df
A B C
1 Acura 1993;2004;2010 Blue;Black;Gold
2 BMW 2013 Silver
3 Toyota 2003;2011 <NA>
4 <NA> <NA> <NA>
我想将数据集扩展为多行,如下所示:
I would like to expand the data set to multiple rows so it looks like this:
> new_df
A B C
1 Acura 1993 Blue
2 Acura 2004 Black
3 Acura 2010 Gold
4 BMW 2013 Silver
5 Toyota 2003 <NA>
6 Toyota 2011 <NA>
7 <NA> <NA> <NA>
我曾尝试使用 tidyr::separate_rows 但是我收到此错误,因为单独的行在每列中逐行需要相同数量的分隔符.这意味着,第 3 行(A = Toyota)是一个问题,因为该行的 C 列中有一个 NA 而不是NA;NA"之类的.这是我收到的命令和错误:
I have tried using tidyr::separate_rows however I get this error because separate rows needs the same number of delimiters in each column by row. This means, row 3 (A = Toyota) is a problem, because there is an NA in column C for that row and not something like "NA;NA". This is the command and error I receive:
df %>% separate_rows(B, C, sep = ";", convert = TRUE)
Error: All nested columns must have the same number of elements.
df[c(1:2,4),] %>% separate_rows(B, C, sep = ";", convert = TRUE)
A B C
1 Acura 1993 Blue
2 Acura 2004 Black
3 Acura 2010 Gold
4 BMW 2013 Silver
5 <NA> NA <NA>
df[c(3),] %>% separate_rows(B, C, sep = ";", convert = TRUE)
Error: All nested columns must have the same number of elements.
有人可以帮助如何实现 new_df 吗?!
Can someone help how to achieve new_df?!
推荐答案
好吧,最简单的解决方案可能是安装 tidyr
(0.8.3.9000) 的开发版本,因为它似乎已在那里修复.使用 devtools::install_github("tidyverse/tidyr")
来实现.
Ok, the easiest solution might be installing the development version of tidyr
(0.8.3.9000) since it seems to be fixed there. Use devtools::install_github("tidyverse/tidyr")
to achieve that.
然而,对于那些无法更新或不想使用包的预发布版本的人的解决方法,我们可以计算每行所需的分隔符数量,并用分隔符填充列中的缺失值.这让 separate_rows
工作并创建空字符串,然后我们将其替换为 NA
.
However, for a workaround for those who can't update or don't want to use a prerelease version of the package, we can count the required number of separators in each row and fill the missing values in the columns with separators. That lets separate_rows
work and creates empty strings, which we then replace back with NA
.
library(tidyverse)
A <- c("Acura", "BMW", "Toyota", NA)
B <- c("1993;2004;2010", "2013", "2003;2011", NA)
C <- c("Blue;Black;Gold", "Silver", NA, NA)
df <- data.frame(A = A, B = B, C = C, stringsAsFactors = FALSE)
df %>%
mutate(seps = str_pad("", width = str_count(B, ";"), pad = ";")) %>%
mutate_at(vars(B, C), ~ coalesce(., seps)) %>%
separate_rows(B, C, sep = ";") %>%
mutate_at(vars(B, C), ~ str_replace(., "^$", NA_character_))
#> A B C seps
#> 1 Acura 1993 Blue ;;
#> 2 Acura 2004 Black ;;
#> 3 Acura 2010 Gold ;;
#> 4 BMW 2013 Silver
#> 5 Toyota 2003 <NA> ;
#> 6 Toyota 2011 <NA> ;
#> 7 <NA> <NA> <NA> <NA>
这篇关于当某些列没有分隔符时,在许多列上使用 tidyr 中的 separator_rows的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!