我有以下数据框:
col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
"chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
"low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)
test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)
我想最终得到这样的表格(值是随机的):
Species Pop.density %Resistance CI_low CI_high Total samples
avi low 2.0 1.2 2.2 30
avi med 0 0 0.5 20
avi high 3.5 2.9 4.2 10
chi low 0.5 0.3 0.7 20
chi med 2.0 1.9 2.1 150
chi high 6.5 6.2 6.6 175
% 抗性列基于上面的 col3,其中 1 = 抗性,0 = 非抗性。我尝试了以下方法:
library(dplyr)
test_data<-test_data %>%
count(col1,col2,col3) %>%
group_by(col1, col2) %>%
mutate(perc_res = prop.table(n)*100)
我试过这个,它似乎几乎可以解决问题,因为我得到了 col3 中总 1 和 0 的百分比,对于 col1 和 2 中的每个值,但是总样本是错误的,因为我正在计算所有三列,当正确的计数仅适用于 col1 和 2。
对于置信区间,我将使用以下内容:
binom.test(resistant samples,total samples)$conf.int*100
但是我不确定如何与其他人一起实现它。
有没有简单快捷的方法来做到这一点?
最佳答案
这个应该可以。
library(tidyverse)
library(broom)
test_data %>%
mutate(col3 = ifelse(col3 == 0, "NonResistant", "Resistant")) %>%
count(col1, col2, col3) %>%
spread(col3, n, fill = 0) %>%
mutate(PercentResistant = Resistant / (NonResistant + Resistant)) %>%
mutate(test = map2(Resistant, NonResistant, ~ binom.test(.x, .x + .y) %>% tidy())) %>%
unnest() %>%
transmute(Species = col1, Pop.density = col2, PercentResistant, CI_low = conf.low * 100, CI_high = conf.high * 100, TotalSamples = Resistant + NonResistant)
test
的嵌套框架中。 结果
关于r - 自动计算数据框的汇总统计数据并创建新表,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46242127/