问题描述
我正在尝试计算每一列中特定值的频率.
I'm trying to count the frequency of a specific value in every column.
基本上,我正在研究不同的细菌分离株(由每一行代表)对不同抗生素(每列代表)的治疗有何反应.1"表示分离株对抗生素具有抗性,而0"表示分离株对抗生素敏感.
Basically, I am looking at how different bacterial isolates (represented by each row) respond to treatment with different antibiotics (represented each column). "1" means the isolate is resistant to the antibiotic, while "0" means the isolate is susceptible to the antibiotic.
antibiotic1 <- c(1, 1, 0, 1, 0, 1, NA, 0, 1)
antibiotic2 <- c(0, 0, NA, 0, 1, 1, 0, 0, 0)
antibiotic3 <- c(0, 1, 1, 0, 0, NA, 1, 0, 0)
ab <- data.frame(antibiotic1, antibiotic2, antibiotic3)
ab
antibiotic1 antibiotic2 antibiotic3
1 1 0 0
2 1 0 1
3 0 NA 1
4 1 0 0
5 0 1 0
6 1 1 NA
7 NA 0 1
8 0 0 0
9 1 0 0
所以看第一行,分离株 1 对抗生素 1 耐药,对抗生素 2 敏感,对抗生素 3 敏感.
So looking at the first row, isolate 1 is resistant to antibiotic 1, sensitive to antibiotic 2, and sensitive to antibiotic 3.
我想计算对每种抗生素耐药的分离株百分比.即,将每列中1"的数量相加,然后除以每列中的分离株数量(不包括分母中的 NA).
I want to calculate the % of isolates resistant to each antibiotic. i.e. sum the number of "1"s in each column and divide by the number of isolates in each column (excluding NAs from my denominator).
我知道如何计数:
apply(ab, 2, count)
$antibiotic1
x freq
1 0 3
2 1 5
3 NA 1
$antibiotic2
x freq
1 0 6
2 1 2
3 NA 1
$antibiotic3
x freq
1 0 5
2 1 3
3 NA 1
但我的实际数据集包含许多不同的抗生素和数百个分离株,因此我希望能够同时跨所有列运行一个函数以生成数据框.
But my actual dataset contains many different antibiotics and hundreds of isolates, so I want to be able to run a function across all columns at the same time to yield a dataframe.
我试过了
counts <- ldply(ab, function(x) sum(x=="1")/(sum(x=="1") + sum(x=="0")))
但这会产生 NA:
.id V1
1 antibiotic1 NA
2 antibiotic2 NA
3 antibiotic3 NA
我也试过:
library(dplyr)
ab %>%
summarise_each(n = n())) %>%
mutate(prop.resis = n/sum(n))
但收到一条错误消息,内容如下:
but get an error message that reads:
Error in n() : This function should not be called directly
任何建议将不胜感激.
推荐答案
我会使用 colMeans
colMeans(ab, na.rm = TRUE)
# antibiotic1 antibiotic2 antibiotic3
# 0.625 0.250 0.375
作为旁注,这可以很容易地推广到计算任何数字的频率.例如,如果您正在寻找所有列中数字 2
的频率,您可以简单地修改为 colMeans(ab == 2, na.rm = TRUE)
As a side note, this can be easily generalized to calculate the frequency of any number. If, for instance, you were looking for the frequency of the number 2
in all columns, you could simply modify to colMeans(ab == 2, na.rm = TRUE)
或者类似地,只是(这避免了矩阵转换与按列评估的权衡)
Or similarly, just (this avoids to matrix conversion with a trade off with by column evaluation)
sapply(ab, mean, na.rm = TRUE)
# antibiotic1 antibiotic2 antibiotic3
# 0.625 0.250 0.375
这篇关于计算每一列出现的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!