当某些列没有分隔符时，在许多列上使用 tidyr 中的 separator_rows

本文介绍了当某些列没有分隔符时，在许多列上使用 tidyr 中的 separator_rows的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下模拟数据集:

A <- c("Acura", "BMW", "Toyota", NA)
B <- c("1993;2004;2010", "2013", "2003;2011", NA)
C <- c("Blue;Black;Gold", "Silver", NA, NA)

df <- data.frame(A = A, B = B, C = C)

所以数据框看起来像这样:

So the data frame looks like this:

> df
         A                B                 C
  1  Acura   1993;2004;2010   Blue;Black;Gold
  2    BMW             2013            Silver
  3 Toyota        2003;2011              <NA>
  4   <NA>             <NA>              <NA>

我想将数据集扩展为多行，如下所示:

I would like to expand the data set to multiple rows so it looks like this:

> new_df
          A         B          C
  1   Acura      1993       Blue
  2   Acura      2004      Black
  3   Acura      2010       Gold
  4     BMW      2013     Silver
  5  Toyota      2003       <NA>
  6  Toyota      2011       <NA>
  7    <NA>      <NA>       <NA>

我曾尝试使用 tidyr::separate_rows 但是我收到此错误，因为单独的行在每列中逐行需要相同数量的分隔符.这意味着，第 3 行(A = Toyota)是一个问题，因为该行的 C 列中有一个 NA 而不是NA;NA"之类的.这是我收到的命令和错误:

I have tried using tidyr::separate_rows however I get this error because separate rows needs the same number of delimiters in each column by row. This means, row 3 (A = Toyota) is a problem, because there is an NA in column C for that row and not something like "NA;NA". This is the command and error I receive:

df %>% separate_rows(B, C, sep = ";", convert = TRUE)
   Error: All nested columns must have the same number of elements.

df[c(1:2,4),] %>% separate_rows(B, C, sep = ";", convert = TRUE)
      A    B      C
1 Acura 1993   Blue
2 Acura 2004  Black
3 Acura 2010   Gold
4   BMW 2013 Silver
5  <NA>   NA   <NA>

df[c(3),] %>% separate_rows(B, C, sep = ";", convert = TRUE)
    Error: All nested columns must have the same number of elements.

有人可以帮助如何实现 new_df 吗?！

Can someone help how to achieve new_df?!

推荐答案

好吧，最简单的解决方案可能是安装 tidyr (0.8.3.9000) 的开发版本，因为它似乎已在那里修复.使用 devtools::install_github("tidyverse/tidyr") 来实现.

Ok, the easiest solution might be installing the development version of tidyr (0.8.3.9000) since it seems to be fixed there. Use devtools::install_github("tidyverse/tidyr") to achieve that.

然而，对于那些无法更新或不想使用包的预发布版本的人的解决方法，我们可以计算每行所需的分隔符数量，并用分隔符填充列中的缺失值.这让 separate_rows 工作并创建空字符串，然后我们将其替换为 NA.

However, for a workaround for those who can't update or don't want to use a prerelease version of the package, we can count the required number of separators in each row and fill the missing values in the columns with separators. That lets separate_rows work and creates empty strings, which we then replace back with NA.

library(tidyverse)
A <- c("Acura", "BMW", "Toyota", NA)
B <- c("1993;2004;2010", "2013", "2003;2011", NA)
C <- c("Blue;Black;Gold", "Silver", NA, NA)
df <- data.frame(A = A, B = B, C = C, stringsAsFactors = FALSE)

df %>%
  mutate(seps = str_pad("", width = str_count(B, ";"), pad = ";")) %>%
  mutate_at(vars(B, C), ~ coalesce(., seps)) %>%
  separate_rows(B, C, sep = ";") %>%
  mutate_at(vars(B, C), ~ str_replace(., "^$", NA_character_))
#>        A    B      C seps
#> 1  Acura 1993   Blue   ;;
#> 2  Acura 2004  Black   ;;
#> 3  Acura 2010   Gold   ;;
#> 4    BMW 2013 Silver     
#> 5 Toyota 2003   <NA>    ;
#> 6 Toyota 2011   <NA>    ;
#> 7   <NA> <NA>   <NA> <NA>

这篇关于当某些列没有分隔符时，在许多列上使用 tidyr 中的 separator_rows的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！