我有一个这样的数据框

ID <- c("A","A","A","A","A","A","A","A")
Step <- c("Step_1","Step_1","Step_2","Step_2","Step_3","Step_3","Step_3","Step_4")
Passfail <- c("Pass","Pass","Fail","Pass","Fail","Fail","Pass","Fail")
Measurement <- c("Length","Length","Breadth","Breadth",
                 "Height","Height","Height","Width")

df <- data.frame(ID,Step,Passfail,Measurement)

我正在尝试创建几个列,当 true 时应返回 1 else 0 GROUPED by (Measurement,ID,ToolID)

对于每组,
  • AP = 1 如果 在 passfail
  • 中只通过
  • AF = 1 如果 仅在 passfail
  • 中失败
  • SFP = 1 如果有 只有 1 个失败并且至少有 1 个通过 在 passfail
  • MFP = 1 如果 超过 1 个失败并且至少有 1 个通过 在 passfail。

  • 期望输出
      Measurement ID   Step AP AF SFP MFP
           Length  A Step_1  1  0   0   0
          Breadth  A Step_2  0  0   1   0
           Height  A Step_3  0  0   0   1
            Width  A Step_4  0  1   0   0
    

    我正在尝试以这种方式获取 AP 和 AF 列,但不太正确
    library(dplyr)
    df1 <- df %>%
      group_by(Measurement,ID,Step) %>%
      mutate(AP = case_when((Passfail == "Pass" & Passfail != "Fail") ~ 1, TRUE ~ 0),
             AF = case_when((Passfail == "Fail" & Passfail != "Pass") ~ 1, TRUE ~ 0)
             ) %>%
      distinct()
    

    最佳答案

    这是您的方法的固定版本:

    df %>%
      group_by(Measurement,ID,Step) %>%
      summarize(AP = case_when(all(Passfail == "Pass") ~ 1, TRUE ~ 0),
                AF = case_when(all(Passfail == "Fail") ~ 1, TRUE ~ 0),
                SFP = case_when(sum(Passfail == "Fail") == 1 & sum(Passfail == "Pass") > 0 ~ 1, TRUE ~ 0),
                MFP = case_when(sum(Passfail == "Fail") > 1 & sum(Passfail == "Pass") > 0 ~ 1, TRUE ~ 0))
    # A tibble: 4 x 7
    # Groups:   Measurement, ID [?]
    #   Measurement ID    Step      AP    AF   SFP   MFP
    #   <fct>       <fct> <fct>  <dbl> <dbl> <dbl> <dbl>
    # 1 Breadth     A     Step_2     0     0     1     0
    # 2 Height      A     Step_3     0     0     0     1
    # 3 Length      A     Step_1     1     0     0     0
    # 4 Width       A     Step_4     0     1     0     0
    

    使用 all(...) 我们要求条件对 Passfail 的所有情况都成立,而使用 sum(Passfail == "Fail") 我们计算失败的次数。通过这两种技术,我们涵盖了所有四种情况。

    但是请注意,因为对于每个变量,您只有两种情况,您也可以稍微简化代码以
    df %>%
      group_by(Measurement,ID,Step) %>%
      summarize(AP = 1 * all(Passfail == "Pass"),
                AF = 1 * all(Passfail == "Fail"),
                SFP = 1 * (sum(Passfail == "Fail") == 1 & sum(Passfail == "Pass") > 0),
                MFP = 1 * (sum(Passfail == "Fail") > 1 & sum(Passfail == "Pass") > 0))
    

    逻辑表达式给出 TRUEFALSE,当乘以 1 时,我们根据需要将这些逻辑向量强制转换为二进制向量。

    关于r - 变异多列以获得 1 或 0 以获取 passfail 条件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54134095/

    10-12 17:24