使用 dplyr 创建数据摘要时,我经常发现自己在计算 CI(使用 CI 中的 Rmisc ):

summary <- data %>%
  group_by(group1, group2) %>%
  summarize(
    var1.mean = CI(var1, ci=0.95)['mean'],
    var1.lower = CI(var1, ci=0.95)['lower'],
    var1.upper = CI(var1, ci=0.95)['upper'],

    var2.mean = CI(var2, ci=0.95)['mean'],
    var2.lower = CI(var2, ci=0.95)['lower'],
    var3.upper = CI(var2, ci=0.95)['upper'],

    var3.mean = CI(var3, ci=0.95)['mean'],
    var3.lower = CI(var3, ci=0.95)['lower'],
    var3.upper = CI(var3, ci=0.95)['upper'],

    var4 = sum(var4)
  )

这既冗长又低效。最后,我希望我可以写一些类似的东西:
summary <- data %>%
  group_by(group1, group2) %>%
  summarize(
    var1 = CI(var1, ci=0.95),
    var2 = CI(var2, ci=0.95),
    var3 = CI(var3, ci=0.95),
    var4 = sum(var4)
  )

对于上面的代码,并且因为 CI 返回一个带有行的命名列
  • "lower" ,
  • "upper"
  • "mean" ,

  • 我希望我能得到一个列看起来像的数据框:
  • "group1" ,
  • "group2 ",
  • "var1.lower" ,
  • "var1.mean" ,
  • "var1.upper" ,
  • "var2.lower" ,
  • ...,
  • "var3.upper" ,
  • "var4"

  • 知道如何实现这一目标吗?有没有办法在 R 中“展平”列?类似于 do.call 但像在 JS 或 Python 中一样应用的东西?

    使用 quasiquotations 可能有一些事情要做,但它开始超越我的 R 技能..

    我曾经将 this gist plyr 一起使用,但它不再与 dplyr 一起使用,与其再次重新编码,我希望有一种比侵入库更好的方法。

    最佳答案

    如果我们首先将输出格式化为 tidyr::unnest,我们可以使用 data.frame
    数据

    library(Rmisc)
    library(dplyr)
    library(tidyr)
    set.seed(1)
    data <- data.frame(group1 = sample(c("A","B"),10,T),
                       group2 = sample(c("A","B"),10,T),
                       var1 = sample(10),
                       var2 = sample(10),
                       var3 = sample(10),
                       var4 = sample(10))
    

    一般解决方案
    data %>% group_by(group1, group2) %>%
      dplyr::summarize(var1 = list(data.frame(t(CI(var1, ci=0.95)))),
                       var2 = list(data.frame(t(CI(var2, ci=0.95)))),
                       var3 = list(data.frame(t(CI(var3, ci=0.95)))),
                       var4 = sum(var4)) %>%
      unnest (var1,var2,var3,.sep=".")
    

    结果
    # A tibble: 4 x 12
    # Groups:   group1 [2]
    #   group1 group2  var4 var1.upper var1.mean var1.lower var2.upper var2.mean  var2.lower var3.upper var3.mean var3.lower
    #   <fctr> <fctr> <int>      <dbl>     <dbl>      <dbl>      <dbl>     <dbl>       <dbl>      <dbl>     <dbl>      <dbl>
    # 1      A      A    13  56.824819       6.0 -44.824819   11.85310  5.500000  -0.8531024   26.55931  7.500000 -11.559307
    # 2      A      B    11  38.265512       6.5 -25.265512   50.97172  6.500000 -37.9717166   25.55931  6.500000 -12.559307
    # 3      B      A    11  12.956686       4.0  -4.956686   13.65205  5.666667  -2.3187188   15.07146  5.666667  -3.738127
    # 4      B      B    20   8.484138       6.0   3.515862   14.70619  4.666667  -5.3728564   11.31872  3.333333  -4.652052
    

    或使用自定义 CI 函数(相同的输出)
    CI2 <- function(x,ci=0.95) list(data.frame(t(CI(x, ci))))
    
    data %>% group_by(group1, group2) %>%
      dplyr::summarize(var1 = CI2(var1, ci=0.95),
                       var2 = CI2(var2, ci=0.95),
                       var3 = CI2(var3, ci=0.95),
                       var4 = sum(var4)) %>%
      unnest (var1,var2,var3,.sep=".")
    

    或使用转换器功能(相同的输出)

    可以与任何其他返回数组的函数一起使用
    vec2rowdf <- function(v) list(data.frame(t(v))) # creates a 1 row data.frame from a vector, wrapped in a list
    data %>% group_by(group1, group2) %>%
      dplyr::summarize(var1 = CI(var1, ci=0.95) %>% vec2rowdf,
                       var2 = CI(var2, ci=0.95) %>% vec2rowdf,
                       var3 = CI(var3, ci=0.95) %>% vec2rowdf,
                       var4 = sum(var4)) %>%
      unnest (var1,var2,var3,.sep=".")
    

    关于r - 展平列作为参数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46251040/

    10-12 17:43
    查看更多