我从一项重复测量的队列研究中获得了健康数据,其中每个人每年都会进行多次随访。在基线(访问 0)时,一些人已经被诊断出患有感兴趣的疾病,而其他人则没有。当我在分析中查看事件案例时,我需要从我的数据中删除那些在访问 0 时被诊断为“生病”的人。我如何在 tidyverse 中做到这一点?我在下面包含了一个我将要查看的数据结构类型的示例:

subject_id <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5)
visit <- c(0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3)
diagnosis <- c("not sick", "not sick", "not sick", "sick", "sick", "sick", "sick", "sick", "not sick", "not sick", "sick", "sick", "sick", "sick", "sick", "sick", "not sick", "not sick", "not sick", "sick")

cohort <- data.frame(subject_id, visit, diagnosis)
cohort

最佳答案

编辑 :如果您想完全删除它们,则:

cohort %>%
  group_by(subject_id) %>%
  mutate(Condn = ifelse(visit==0 & diagnosis=="sick",1,0) ) %>%
  filter(all(Condn==0))

原始

我们可以做的:
cohort %>%
  group_by(subject_id) %>%
   mutate(Condn = ifelse(visit==0 & diagnosis=="sick",1,0) ) %>%
   filter(Condn==0) %>%
   ungroup()  %>%
   select(-Condn)

关于根据基线特征从队列研究数据中删除个体,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56795215/

10-12 16:40
查看更多