问题描述
当我使用group_by并在dplyr中进行汇总时,我自然可以将不同的汇总函数应用于不同的变量。例如:
When I use group_by and summarise in dplyr, I can naturally apply different summary functions to different variables. For instance:
library(tidyverse)
df <- tribble(
~category, ~x, ~y, ~z,
#----------------------
'a', 4, 6, 8,
'a', 7, 3, 0,
'a', 7, 9, 0,
'b', 2, 8, 8,
'b', 5, 1, 8,
'b', 8, 0, 1,
'c', 2, 1, 1,
'c', 3, 8, 0,
'c', 1, 9, 1
)
df %>% group_by(category) %>% summarize(
x=mean(x),
y=median(y),
z=first(z)
)
输出结果:
# A tibble: 3 x 4
category x y z
<chr> <dbl> <dbl> <dbl>
1 a 6 6 8
2 b 5 1 8
3 c 2 8 1
我的问题是,我怎么用summarise_at来做到这一点?显然,对于此示例而言,这是不必要的,但是假设我有很多我想取均值的变量,很多中位数,等等。
My question is, how would I do this with summarise_at? Obviously for this example it's unnecessary, but assume I have lots of variables that I want to take the mean of, lots of medians, etc.
我是否一次失去了此功能我搬到summarise_at吗?我必须在所有变量组上使用所有函数,然后丢弃那些我不需要的函数吗?
Do I lose this functionality once I move to summarise_at? Do I have to use all functions on all groups of variables and then throw away the ones I don't want?
也许我只是想念一些东西,但是我可以还没弄清楚,在文档中也看不到任何示例。感谢您的帮助。
Perhaps I'm just missing something, but I can't figure it out, and I don't see any examples of this in the documentation. Any help is appreciated.
推荐答案
这是一个主意。
library(tidyverse)
df_mean <- df %>%
group_by(category) %>%
summarize_at(vars(x), funs(mean(.)))
df_median <- df %>%
group_by(category) %>%
summarize_at(vars(y), funs(median(.)))
df_first <- df %>%
group_by(category) %>%
summarize_at(vars(z), funs(first(.)))
df_summary <- reduce(list(df_mean, df_median, df_first),
left_join, by = "category")
就像您说的,在此示例中无需使用 summarise_at
。但是,如果您有很多列需要按不同功能进行汇总,则此策略可能会起作用。您需要为每个 summarize_at
指定 vars(...)
中的列。规则与 dplyr :: select
函数相同。
Like you said, there is no need to use summarise_at
for this example. However, if you have a lot of columns need to be summarized by different functions, this strategy may work. You will need to specify the columns in the vars(...)
for each summarize_at
. The rule is the same as the dplyr::select
function.
这是另一个想法。定义一个修改 summarise_at
函数的函数,然后使用 map2
将该函数与查找列表一起显示变量和要应用的关联函数。在此示例中,我将平均值
应用于 x
和 y
列和中位数
到 z
。
Here is another idea. Define a function which modifies the summarise_at
function, and then use map2
to apply this function with a look-up list showing variables and associated functions to apply. In this example, I applied mean
to x
and y
column and median
to z
.
# Define a function
summarise_at_fun <- function(variable, func, data){
data2 <- data %>%
summarise_at(vars(variable), funs(get(func)(.)))
return(data2)
}
# Group the data
df2 <- df %>% group_by(category)
# Create a look-up list with function names and variable to apply
look_list <- list(mean = c("x", "y"),
median = "z")
# Apply the summarise_at_fun
map2(look_list, names(look_list), summarise_at_fun, data = df2) %>%
reduce(left_join, by = "category")
# A tibble: 3 x 4
category x y z
<chr> <dbl> <dbl> <dbl>
1 a 6 6 0
2 b 5 3 8
3 c 2 6 1
这篇关于summarise_at对不同变量使用不同的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!