我有一个这样的数据框
ID <- c("A","A","A","A","A","A","A","A")
Step <- c("Step_1","Step_1","Step_2","Step_2","Step_3","Step_3","Step_3","Step_4")
Passfail <- c("Pass","Pass","Fail","Pass","Fail","Fail","Pass","Fail")
Measurement <- c("Length","Length","Breadth","Breadth",
"Height","Height","Height","Width")
df <- data.frame(ID,Step,Passfail,Measurement)
我正在尝试创建几个列,当 true 时应返回 1 else 0 GROUPED by (Measurement,ID,ToolID)
对于每组,
期望输出 是
Measurement ID Step AP AF SFP MFP
Length A Step_1 1 0 0 0
Breadth A Step_2 0 0 1 0
Height A Step_3 0 0 0 1
Width A Step_4 0 1 0 0
我正在尝试以这种方式获取 AP 和 AF 列,但不太正确
library(dplyr)
df1 <- df %>%
group_by(Measurement,ID,Step) %>%
mutate(AP = case_when((Passfail == "Pass" & Passfail != "Fail") ~ 1, TRUE ~ 0),
AF = case_when((Passfail == "Fail" & Passfail != "Pass") ~ 1, TRUE ~ 0)
) %>%
distinct()
最佳答案
这是您的方法的固定版本:
df %>%
group_by(Measurement,ID,Step) %>%
summarize(AP = case_when(all(Passfail == "Pass") ~ 1, TRUE ~ 0),
AF = case_when(all(Passfail == "Fail") ~ 1, TRUE ~ 0),
SFP = case_when(sum(Passfail == "Fail") == 1 & sum(Passfail == "Pass") > 0 ~ 1, TRUE ~ 0),
MFP = case_when(sum(Passfail == "Fail") > 1 & sum(Passfail == "Pass") > 0 ~ 1, TRUE ~ 0))
# A tibble: 4 x 7
# Groups: Measurement, ID [?]
# Measurement ID Step AP AF SFP MFP
# <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
# 1 Breadth A Step_2 0 0 1 0
# 2 Height A Step_3 0 0 0 1
# 3 Length A Step_1 1 0 0 0
# 4 Width A Step_4 0 1 0 0
使用
all(...)
我们要求条件对 Passfail
的所有情况都成立,而使用 sum(Passfail == "Fail")
我们计算失败的次数。通过这两种技术,我们涵盖了所有四种情况。但是请注意,因为对于每个变量,您只有两种情况,您也可以稍微简化代码以
df %>%
group_by(Measurement,ID,Step) %>%
summarize(AP = 1 * all(Passfail == "Pass"),
AF = 1 * all(Passfail == "Fail"),
SFP = 1 * (sum(Passfail == "Fail") == 1 & sum(Passfail == "Pass") > 0),
MFP = 1 * (sum(Passfail == "Fail") > 1 & sum(Passfail == "Pass") > 0))
逻辑表达式给出
TRUE
或 FALSE
,当乘以 1 时,我们根据需要将这些逻辑向量强制转换为二进制向量。关于r - 变异多列以获得 1 或 0 以获取 passfail 条件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54134095/