r - 计算R中疾病的首次出现

我正在尝试计算一种疾病的首次发生(例如心肌梗塞(MI)“心脏病发作”)，但是我很难在R(基础或tidyverse)中实施该疾病。任何帮助表示赞赏。

谢谢你们。这很好。我意识到我不清楚自己的例子。
这些方法总的来说效果很好，但是我想要一种方法
时间段的发生率和流行率。
发病率是在特定时间发生的新病例的比例除以未感染该疾病的人数


n_id <- 5 # five individuals
n_time <- 4 # four time pints
id <- rep(1:n_id, each = n_time)
time <- rep(1:n_time,times = n_id)
MI <- c(0,0,1,1,
        0,1,1,1,
        0,0,0,1,
        0,0,0,0,
        0,0,0,0)
dsn <- data.frame(id, time, MI)
MI2 <- c(0,0,1,NA,
         0,1,NA,NA,
         0,0,0,1,
         0,0,0,0,
         0,0,0,0)
dsn2 <- data.frame(id, time, MI, MI2)
library(dplyr)
arrange(dsn2, time)
dsn2

#>    id time MI MI2
#> 1   1    1  0   0
#> 2   2    1  0   0
#> 3   3    1  0   0
#> 4   4    1  0   0
#> 5   5    1  0   0
#> 6   1    2  0   0
#> 7   2    2  1   1
#> 8   3    2  0   0
#> 9   4    2  0   0
#> 10  5    2  0   0
#> 11  1    3  1   1
#> 12  2    3  1  NA
#> 13  3    3  0   0
#> 14  4    3  0   0
#> 15  5    3  0   0
#> 16  1    4  1  NA
#> 17  2    4  1  NA
#> 18  3    4  1   1
#> 19  4    4  0   0
#> 20  5    4  0   0

#in the example above, it can be calculated as below
#For the incidence at each time point (proportion of new cases that occur at a particular time divided by the number of people who did not get the disease)
#time 1 = 0/5 =0
#time 2 = 1/5 =0.2
#time 3 = 1/4 =0.25
#time 4 = 1/3 =0.33

##For the prevalence at each time point (the proportion of new and old cases divided by total population)
#time 1 = 0/5 =0
#time 2 = 1/5 =0.2
#time 3 = 2/5 =0.4
#time 4 = 3/5 =0.6

time <- 1:4
incidence <- c(0/5, 1/5, 1/4, 1/3)
prevalence <- c(0/5, 1/5, 2/5, 3/5)

results <- cbind(time, incidence, prevalence)
results
#>      time incidence prevalence
#> [1,]    1 0.0000000        0.0
#> [2,]    2 0.2000000        0.2
#> [3,]    3 0.2500000        0.4
#> [4,]    4 0.3333333        0.6

我希望能够在每个时间点执行此操作，并考虑上一个时间点发生的情况。 for循环会成为方法吗？
非常感谢

最佳答案

根据您的编辑，这是一种计算发生率的解决方案。如果疾病发生在时间1，它也会返回正确的结果。

library(dplyr)

dsn %>%
  group_by(id) %>%
  mutate(neg = MI == 1 & !duplicated(MI)) %>%
  group_by(time) %>%
  summarise(d = sum(MI != 1),
            prevalence = mean(MI),
            n = sum(neg)) %>%
  transmute(time,
            incidence = n / lag(d, default = n_distinct(dsn$id)),
            prevalence)

   time incidence prevalence
  <int>     <dbl>      <dbl>
1     1     0            0
2     2     0.2          0.2
3     3     0.25         0.4
4     4     0.333        0.6

关于r - 计算R中疾病的首次出现，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/60386680/