问题描述
我无法根据以下数据集使用 dplyr
准备汇总表:
I am having trouble to prepare a summary table using dplyr
based on the data set below:
set.seed(1)
df <- data.frame(rep(sample(c(2012,2016),10, replace = T)),
sample(c('Treat','Control'),10,replace = T),
runif(10,0,1),
runif(10,0,1),
runif(10,0,1))
colnames(df) <- c('Year','Group','V1','V2','V3')
我想计算Year
和Group
的每种组合的平均值、中位数、标准差并计算观察次数.
I want to calculate the mean, median, standard deviation and count the number of observations by each combination of Year
and Group
.
我已成功使用此代码获得mean
、median
和sd
:
I have successfully used this code to get mean
, median
and sd
:
summary.table = df %>%
group_by(Year, Group) %>%
summarise_all(funs(n(), sd, median, mean))
但是,我不知道如何在funs()
命令中引入n()
函数.它给了我 V1
、V2
和 V3
的计数.这是非常多余的,因为我只想要样本的大小.我试过介绍
However, I do not know how to introduce the n()
function inside the funs()
command. It gave me the counting for V1
, V2
and V3
. This is quite redundant, since I only want the size of the sample. I have tried introducing
mutate(N = n()) %>%
在 group_by()
行之前和之后,但它没有给我想要的.
before and after the group_by()
line, but it did not give me what I wanted.
有什么帮助吗?
我的怀疑还不够清楚.问题是代码给了我不需要的列,因为 V1
的观察数量对我来说已经足够了.
I had not made my doubt clear enough. The problem is that the code gives me columns that I do not need, since the number of observations for V1
is sufficient for me.
推荐答案
在汇总为额外分组列之前添加 N
列:
Add the N
column before summarizing as an extra grouping column:
library(dplyr)
set.seed(1)
df <- data.frame(Year = rep(sample(c(2012, 2016), 10, replace = TRUE)),
Group = sample(c('Treat', 'Control'), 10, replace = TRUE),
V1 = runif(10, 0, 1),
V2 = runif(10, 0, 1),
V3 = runif(10, 0, 1))
df2 <- df %>%
group_by(Year, Group) %>%
group_by(N = n(), add = TRUE) %>%
summarise_all(funs(sd, median, mean))
df2
#> # A tibble: 4 x 12
#> # Groups: Year, Group [?]
#> Year Group N V1_sd V2_sd V3_sd V1_median V2_median
#> <dbl> <fctr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2012 Control 2 0.05170954 0.29422635 0.1152669 0.3037848 0.6193239
#> 2 2012 Treat 2 0.51092899 0.08307494 0.1229560 0.5734239 0.5408230
#> 3 2016 Control 3 0.32043716 0.34402222 0.3822026 0.3823880 0.4935413
#> 4 2016 Treat 3 0.37759667 0.29566739 0.1233162 0.3861141 0.6684667
#> # ... with 4 more variables: V3_median <dbl>, V1_mean <dbl>,
#> # V2_mean <dbl>, V3_mean <dbl>
这篇关于在计算其他汇总统计量的同时使用 n()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!