Dplyr函数可计算平均值，n，sd和标准误差

本文介绍了Dplyr函数可计算平均值，n，sd和标准误差的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我发现自己一直在编写这段代码，以产生群均值的标准误(然后用于绘制置信区间).

I find myself writing this bit of code all the time to produce standard errors for group means ( to then use for plotting confidence intervals).

但是，最好编写自己的函数来用一行代码来完成此操作.我已经阅读了dplyr中的 nse 插图-standard评估以及此博客文章.我得到了有点，但是我实在是一个菜鸟，无法独自解决这个问题.有人可以帮忙吗?谢谢.

It would be nice to write my own function to do this in one line of code, though. I have read the nse vignette in dplyr on non-standard evaluation and this blog post as well. I get it somewhat, but I'm too much of a noob to figure this out on my own. Can anyone help out? Thanks.

var1<-sample(c('red', 'green'), size=10, replace=T)
var2<-rnorm(10, mean=5, sd=1)
df<-data.frame(var1, var2)
df %>%
group_by(var1) %>%
summarize(avg=mean(var2), n=n(), sd=sd(var2), se=sd/sqrt(n))

推荐答案

您可以使用函数enquo在函数调用中显式命名变量:

You can use the function enquo to explicitly name the variables in your function call:

my_fun <- function(x, cat_var, num_var){
  cat_var <- enquo(cat_var)
  num_var <- enquo(num_var)

  x %>%
    group_by(!!cat_var) %>%
    summarize(avg = mean(!!num_var), n = n(),
              sd = sd(!!num_var), se = sd/sqrt(n))
}

为您提供:

> my_fun(df, var1, var2)
# A tibble: 2 x 5
    var1      avg     n        sd        se
  <fctr>    <dbl> <int>     <dbl>     <dbl>
1  green 4.873617     7 0.7515280 0.2840509
2    red 5.337151     3 0.1383129 0.0798550

且与您的示例输出匹配:

and that matches the ouput of your example:

> df %>%
+   group_by(var1) %>%
+   summarize(avg=mean(var2), n=n(), sd=sd(var2), se=sd/sqrt(n))
# A tibble: 2 x 5
    var1      avg     n        sd        se
  <fctr>    <dbl> <int>     <dbl>     <dbl>
1  green 4.873617     7 0.7515280 0.2840509
2    red 5.337151     3 0.1383129 0.0798550

OP要求从函数中删除group_by语句，以向group_by添加多个变量.有两种方法可以执行此IMO.首先，您可以简单地删除group_by语句，然后将分组的数据帧通过管道传递到函数中.该方法将如下所示:

The OP has asked to remove the group_by statement from the function to add the ability to group_by more than one variables. There are two ways to go about this IMO. First, you could simply remove the group_by statement and pipe a grouped data frame into the function. That method would look like this:

my_fun <- function(x, num_var){
  num_var <- enquo(num_var)

  x %>%
    summarize(avg = mean(!!num_var), n = n(),
              sd = sd(!!num_var), se = sd/sqrt(n))
}

df %>%
  group_by(var1) %>%
  my_fun(var2)

进行此操作的另一种方法是使用...和quos允许该函数捕获group_by语句的多个参数.看起来像这样:

Another way to go about this is to use ... and quos to allow for the function to capture multiple arguments for the group_by statement. That would look like this:

#first, build the new dataframe
var1<-sample(c('red', 'green'), size=10, replace=T)
var2<-rnorm(10, mean=5, sd=1)
var3 <- sample(c("A", "B"), size = 10, replace = TRUE)
df<-data.frame(var1, var2, var3)

# using the first version `my_fun`, it would look like this
df %>%
  group_by(var1, var3) %>%
  my_fun(var2)

# A tibble: 4 x 6
# Groups:   var1 [?]
    var1   var3      avg     n        sd        se
  <fctr> <fctr>    <dbl> <int>     <dbl>     <dbl>
1  green      A 5.248095     1       NaN       NaN
2  green      B 5.589881     2 0.7252621 0.5128378
3    red      A 5.364265     2 0.5748759 0.4064986
4    red      B 4.908226     5 1.1437186 0.5114865

# Now doing it with a new function `my_fun2`
my_fun2 <- function(x, num_var, ...){
  group_var <- quos(...)
  num_var <- enquo(num_var)

  x %>%
    group_by(!!!group_var) %>%
    summarize(avg = mean(!!num_var), n = n(),
              sd = sd(!!num_var), se = sd/sqrt(n))
}

df %>%
  my_fun2(var2, var1, var3)

# A tibble: 4 x 6
# Groups:   var1 [?]
    var1   var3      avg     n        sd        se
  <fctr> <fctr>    <dbl> <int>     <dbl>     <dbl>
1  green      A 5.248095     1       NaN       NaN
2  green      B 5.589881     2 0.7252621 0.5128378
3    red      A 5.364265     2 0.5748759 0.4064986
4    red      B 4.908226     5 1.1437186 0.5114865

这篇关于Dplyr函数可计算平均值，n，sd和标准误差的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！