使用dplyr的group_by执行拆分应用组合

本文介绍了使用dplyr的group_by执行拆分应用组合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！
问题描述

我正在尝试使用 dplyr 来执行以下操作：
  ir ply split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split split Petal.Lengths by Speicies，并应用一个函数，在这种情况下 shapiro.test。我读了这个和相当多的其他页面。我可以使用 do 将变量拆分成组：
  iris％>％
 group_by（Species）％>％
 select（Petal.Length）％>％
 do（print（。$ Petal.Length））
 
 [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 
 [16] 1.5 1.3 1.4 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 
 [31] 1.6 1.5 1.5 1.4 1.5 1.2 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 
 [46] 1.4 1.6 1.4 1.5 1.4 
 [1] 4.7 4.5 4.9 4.0 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 
 [16] 4.4 4.5 4.1 4.5 3.9 4.8 4.0 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 
 [31] 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0 3.3 4.2 
 [46] 4.2 4.2 4.3 3.0 4.1 
  
将列分割成组似乎正在工作。但是，将片段传递给shapiro.test的方式还是让我失望。我看到 group_by 不同于拆分为。
 
 
 我尝试了很多变体，包括：
  iris％>％
 group_by（Species）％>％
 select Petal.Length）％>％
总结（shapiro.test）
  
 / p> 
 
 
  iris％>％
 group_by（Species）％>％
 select（Petal.Length） ％>％
 summarise_each（funs（shapiro.test））
 
＃错误：期望单个值
  
如何使 dplyr 运行 shapiro.test（）三次，每个物种的长度一次？
解决方案
我可以看到两种方法来做，取决于你想要的使用输出。您可以从 shapiro.test 中的中总结中的p值。或者，您可以使用 do 并将每个测试的结果保存在列表中。
  library（dplyr）
  
使用总结，只拉出p值：
  iris％>％
 group_by（Species）％> ％
总结（stest = shapiro.test（Petal.Length）$ p.value）
 
种类stest 
 1 setosa 0.05481147 
 2 versicolor 0.15847784 
 3 virginica 0.10977537 
  
使用 do  p> 
 
 
  tests = iris％>％
 group_by（Species）％>％
 do（test = shapiro .test（。$ Petal.Length））
 
＃结果列表
测试$ test 
 
 [[1]] 
 
 Shapiro -Wilk normality test 
 
 data：。$ Petal.Length 
 W = 0.955，p-value = 0.05481 
 
 
 [[2]] 
 
 Shapiro-Wilk正态度测试
 
数据：$ Petal.Length 
 W = 0.966，p值= 0.1585 
 
 
 [[3]] 
 
 Shapiro-Wilk正态度测试
 
数据：$ Petal.Length 
 W = 0.9622，p值= 0.1098 
  pre> 
I am trying to use dplyr to do the following:
 tapply(iris$Petal.Length, iris$Species, shapiro.test)
I want to split the Petal.Lengths by Speicies, and apply a function, in this case shapiro.test. I read this SO question and quite a number of other pages. I am sort of able to split the variable into groups, using do:
iris %>%
  group_by(Species) %>%
  select(Petal.Length) %>%
  do(print(.$Petal.Length))

 [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2
[16] 1.5 1.3 1.4 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6
[31] 1.6 1.5 1.5 1.4 1.5 1.2 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9
[46] 1.4 1.6 1.4 1.5 1.4
 [1] 4.7 4.5 4.9 4.0 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6
[16] 4.4 4.5 4.1 4.5 3.9 4.8 4.0 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5
[31] 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0 3.3 4.2
[46] 4.2 4.2 4.3 3.0 4.1
The 'splitting' of the column into groups seems to be working. But the way to pass the pieces to shapiro.test is still eluding me. I see that group_by is different from split into.
I tried lots of variations, including:
iris %>%
  group_by(Species) %>%
  select(Petal.Length) %>%
  summarise(shapiro.test)
and also
iris %>%
  group_by(Species) %>%
  select(Petal.Length) %>%
  summarise_each(funs(shapiro.test))

 # Error: expecting a single value
How can I make dplyr run shapiro.test() thrice, once for the Petal.Lengths of each Species?
 解决方案 
I could see two ways to do it, depending on how you want to use the output.  You could pull out just the p-values from shapiro.test in summarise.  Alternatively you could use do and save the results of each test in a list.
library(dplyr)
With summarise, pulling out just the p-values:
iris %>%
    group_by(Species) %>%
    summarise(stest = shapiro.test(Petal.Length)$p.value)

     Species      stest
1     setosa 0.05481147
2 versicolor 0.15847784
3  virginica 0.10977537
Using do:
tests = iris %>%
    group_by(Species) %>%
    do(test = shapiro.test(.$Petal.Length))

# Resulting list
tests$test

[[1]]

    Shapiro-Wilk normality test

data:  .$Petal.Length
W = 0.955, p-value = 0.05481


[[2]]

    Shapiro-Wilk normality test

data:  .$Petal.Length
W = 0.966, p-value = 0.1585


[[3]]

    Shapiro-Wilk normality test

data:  .$Petal.Length
W = 0.9622, p-value = 0.1098
                        
这篇关于使用dplyr的group_by执行拆分应用组合的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！