问题描述
我有一个分类变量的数据框,我已经分组,我得到了每个组的计数。
I have a data.frame of categorical variables that I have divided into groups and I got the counts for each group.
My original data nyD looks like:
Source: local data frame [7 x 3]
Groups: v1, v2, v3
v1 v2 v3
1 a plus yes
2 a plus yes
3 a minus no
4 b minus yes
5 b x yes
6 c x notk
7 c x notk
I performed the following operations using dplyr:
ny1 <- nyD %>% group_by(v1,v2,v3)%>%
summarise(count=n()) %>%
mutate(prop = count/sum(count))
My data "ny1" looks like:
Source: local data frame [5 x 5]
Groups: v1, v2
v1 v2 v3 count prop
1 a minus no 1 1
2 a plus yes 2 1
3 b minus yes 1 1
4 b x yes 1 1
5 c x notk 2 1
我想计算相对值频率与prop变量中的V1组有关。 prop变量应该是相应的计数除以V1组的计数总和。 V1组共有3a,2b和1c。也就是说,ny1 $ prop [1]< - 1/3,ny1 $ prop [2]< - 2/3 ....
使用count / sum(count)的mutate操作不正确。我需要指出,这个总和应该只对V1组实现。
有没有办法使用dplyr来实现这一点?
I want to calculate the relative frequency in relation to the V1 Groups in the prop variable. The prop variable should be the corresponding count divided by the "sum of counts for V1 group". V1 group has a total of 3 "a", 2 "b" and 1 "c". That is, ny1$prop[1] <- 1/3, ny1$prop[2] <- 2/3....The mutate operation where using count/sum(count) is not correct. I need to specify that the sum should be realed only to V1 group.Is there a way to use dplyr to achieve this?
推荐答案
你可以一步一步地完成这件事情从您的原始数据 nyD
而不创建 ny1
)。那是因为在总结
之后,您将运行 mutate
, dplyr
将默认删除一个聚合级别( v2
)(肯定是我最喜欢的功能 dplyr
),并且只会聚合通过 v1
You can do this whole thing in one step (from your original data nyD
and without creating ny1
). That is because when you'll run mutate
after summarise
, dplyr
will drop one aggregation level (v2
) by default (certainly my favorite feature in dplyr
) and will aggregate only by v1
nyD %>%
group_by(v1, v2) %>%
summarise(count = n()) %>%
mutate(prop = count/sum(count))
# Source: local data frame [5 x 4]
# Groups: v1
#
# v1 v2 count prop
# 1 a minus 1 0.3333333
# 2 a plus 2 0.6666667
# 3 b minus 1 0.5000000
# 4 b x 1 0.5000000
# 5 c x 2 1.0000000
或使用计数
的更短版本(感谢@beginneR)
Or a shorter version using count
(Thanks to @beginneR)
df %>%
count(v1, v2) %>%
mutate(prop = n/sum(n))
这篇关于计算某一组的相对频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!