这个问题不是关于采样数据,我是关于sample_n的,但是这个问题是关于模拟数据帧中的数据,以比较模拟结果与实际值的平均值(使用group_by summarise)。
我计算了以下之间的平均值的实际差异
df %>%
group_by(allfour) %>%
summarise(hs_completion=mean(hsgrad),
count=n())
但是,我正在努力从每个组中提取100个模拟,然后将每个 vector 除以各自的组大小,以将其转换为模拟的毕业率,并计算两组之间这些率的差异。发布之后,我需要绘制这些模拟差异的直方图,并在该直方图中添加一条红色的垂直线,作为观察数据中计算出的均值之差的值。
我知道tidyverse和ggplot,所以当记录受到限制时,如何进行100次模拟并不是一个问题。
数据框df的示例如下:
structure(list(hsgrad = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L), allfour = structure(c(1L,
2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA,
100L), class = "data.frame")
最佳答案
重要信息在此行中:
因此,您需要在每个组中以成功的概率来模拟bernoulli。我们计算总体成功(毕业)率:
rate = mean(df$hsgrad)
1次模拟的基本代码是这样的,您可以给出模拟次数(1000次),试验次数(即小组人数)和成功率(从上面得出):
sim_1 = rbinom(1000,sum(df$allfour==1),prob=rate)
hist(sim_1/sum(df$allfour==1),br=20)
在假设比率为总比率的情况下,这为您提供了allfour == 1组中成功的模拟概率。现在我们只需要对两个组执行此操作:
grp0_size = sum(df$allfour==0)
grp1_size = sum(df$allfour==1)
nsim = 1000
observed = diff(tapply(df$hsgrad,df$allfour,mean))
data.frame(
grp0_success = rbinom(nsim,grp0_size,rate)/grp0_size,
grp1_success = rbinom(nsim,grp1_size,rate)/grp1_size) %>%
mutate(diff=grp1_success-grp0_success) %>%
ggplot(aes(x=diff)) + geom_histogram() +
geom_vline(xintercept=observed)
关于r - 从数据框中按组随机绘制2个单独的100个仿真,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/61137865/