问题描述
我试图在 R/S-PLUS 中一次性获得按类别列分组的多个汇总统计信息.我找到了几个函数,但它们每个调用都做一个统计,比如 aggregate()
.
I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate()
.
data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,
71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
grp <- factor(rep(LETTERS[1:4], c(4,6,6,8)))
df <- data.frame(group=grp, dt=data)
mg <- aggregate(df$dt, by=df$group, FUN=mean)
mg <- aggregate(df$dt, by=df$group, FUN=sum)
我正在寻找的是在一次调用中获取同一组的多个统计数据,例如均值、最小值、最大值、标准差等,这可行吗?
What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?
推荐答案
1.点击
我会为 tapply()
投入 2 美分.
tapply(df$dt, df$group, summary)
您可以使用所需的特定统计数据编写自定义函数或格式化结果:
You could write a custom function with the specific statistics you want or format the results:
tapply(df$dt, df$group,
function(x) format(summary(x), scientific = TRUE))
$A
Min. 1st Qu. Median Mean 3rd Qu. Max.
"5.900e+01" "5.975e+01" "6.100e+01" "6.100e+01" "6.225e+01" "6.300e+01"
$B
Min. 1st Qu. Median Mean 3rd Qu. Max.
"6.300e+01" "6.425e+01" "6.550e+01" "6.600e+01" "6.675e+01" "7.100e+01"
$C
Min. 1st Qu. Median Mean 3rd Qu. Max.
"6.600e+01" "6.725e+01" "6.800e+01" "6.800e+01" "6.800e+01" "7.100e+01"
$D
Min. 1st Qu. Median Mean 3rd Qu. Max.
"5.600e+01" "5.975e+01" "6.150e+01" "6.100e+01" "6.300e+01" "6.400e+01"
2.data.table
data.table
包为这些类型的操作提供了许多有用且快速的工具:
2. data.table
The data.table
package offers a lot of helpful and fast tools for these types of operation:
library(data.table)
setDT(df)
> df[, as.list(summary(dt)), by = group]
group Min. 1st Qu. Median Mean 3rd Qu. Max.
1: A 59 59.75 61.0 61 62.25 63
2: B 63 64.25 65.5 66 66.75 71
3: C 66 67.25 68.0 68 68.00 71
4: D 56 59.75 61.5 61 63.00 64
这篇关于如何按组获取汇总统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!