问题描述
在使用plyr软件包中的ddply函数时遇到了一些麻烦.我正在尝试总结以下数据,包括每个组中的计数和比例.这是我的数据:
I am having some trouble using the ddply function from the plyr package. I am trying to summarise the following data with counts and proportions within each group. Here's my data:
structure(list(X5employf = structure(c(1L, 3L, 1L, 1L, 1L, 3L,
1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 3L, 1L,
3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L,
3L, 3L, 1L), .Label = c("increase", "decrease", "same"), class = "factor"),
X5employff = structure(c(2L, 6L, NA, 2L, 4L, 6L, 5L, 2L,
2L, 8L, 2L, 2L, 2L, 7L, 7L, 8L, 11L, 7L, 2L, 8L, 8L, 11L,
7L, 6L, 2L, 5L, 2L, 8L, 7L, 7L, 7L, 8L, 6L, 7L, 5L, 5L, 7L,
2L, 6L, 7L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 2L, 5L, 2L, 2L,
2L, 5L, 12L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 2L, 5L, 2L,
13L, 9L, 9L, 9L, 7L, 8L, 5L), .Label = c("", "1", "1 and 8",
"2", "3", "4", "5", "6", "6 and 7", "6 and 7 ", "7", "8",
"1 and 8"), class = "factor")), .Names = c("X5employf", "X5employff"
), row.names = c(NA, 73L), class = "data.frame")
这是我使用ddply的电话:
And here's my call using ddply:
ddply(kano_final, .(X5employf, X5employff), summarise, n=length(X5employff), prop=(n/sum(n))*100)
这给了我X5employff
的每个实例的正确计数,但是似乎比例是在每一行中而不是在因子X5employf
的每个级别内计算的,如下所示:
This gives me the counts of each instance of X5employff
correctly, but but seems as though the proportion is being calculated across each row and not within each level of the factor X5employf
as follows:
X5employf X5employff n prop
1 increase 1 26 100
2 increase 2 1 100
3 increase 3 15 100
4 increase 1 and 8 1 100
5 increase <NA> 1 100
6 decrease 4 1 100
7 decrease 5 5 100
8 decrease 6 2 100
9 decrease 7 1 100
10 decrease 8 1 100
11 same 4 4 100
12 same 5 6 100
13 same 6 5 100
14 same 6 and 7 3 100
15 same 7 1 100
手动计算每个组中的比例时,我得到以下信息:
When manually calculating the proportions within each group I get this:
X5employf X5employff n prop
1 increase 1 26 59.09
2 increase 2 1 2.27
3 increase 3 15 34.09
4 increase 1 and 8 1 2.27
5 increase <NA> 1 2.27
6 decrease 4 1 10.00
7 decrease 5 5 50.00
8 decrease 6 2 20.00
9 decrease 7 1 10.00
10 decrease 8 1 10.00
11 same 4 4 21.05
12 same 5 6 31.57
13 same 6 5 26.31
14 same 6 and 7 3 15.78
15 same 7 1 5.26
如您所见,X5employf各个级别的比例总和等于100.
As you can see the sum of proportions in each level of factor X5employf equals 100.
我知道这可能很简单,但是尽管阅读了各种类似的文章,我似乎还是无法理解.有人可以帮忙吗?我对摘要功能的工作方式有了解吗?!
I know this is probably ridiculously simple, but I can't seem to get my head around it despite reading all sorts of similar posts. Can anyone help with this and my understanding of how the summarise function works?!
非常感谢
马蒂
推荐答案
您不能在一个ddply
调用中执行此操作,因为传递给每个summarize
调用的是该组特定组合的数据子集变量.在此最低级别,您无权访问该中间级别sum(n)
.而是分两个步骤进行操作:
You cannot do it in one ddply
call because what gets passed to each summarize
call is a subset of your data for a specific combination of your group variables. At this lowest level, you do not have access to that intermediate level sum(n)
. Instead, do it in two steps:
kano_final <- ddply(kano_final, .(X5employf), transform,
sum.n = length(X5employf))
ddply(kano_final, .(X5employf, X5employff), summarise,
n = length(X5employff), prop = n / sum.n[1] * 100)
编辑:使用单个ddply
调用,并按提示使用table
:
Edit: using a single ddply
call and using table
as you hinted towards:
ddply(kano_final, .(X5employf), summarise,
n = Filter(function(x) x > 0, table(X5employff, useNA = "ifany")),
prop = 100* prop.table(n),
X5employff = names(n))
这篇关于ddply汇总比例计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!