我正在尝试对数据集进行正态性检验(shapiro-wilk),并且我希望同时为所有列提供统计信息和p值。我已经阅读了SO(R: Shapiro test by group won't produce p-values and corrupt data frame warning,Using shapiro.test on multiple columns in a data frame)上与此有关的所有其他页面,但仍然无法弄清楚。任何帮助,将不胜感激!!
敌人,例如,这里是数据集:具有一个字符向量(NVL)和其余数字,我想按NVL(NV / VL)分组。
NVL Var1 Var2 Var3 Var 4 Var 5
1. NV 22.5 26.8 89.2 35.7 100
2. NV 34.7 67.4 29.8 12.4 100
3. NV 68.3 34.5 44.5 23.8 100
4. NV 11.2 55.3 17.5 77.9 100
5. VL 55.6 77.2 59.7 89.6 100
6. VL 60.5 88.7 65.4 99.6 100
7. VL 89.4 87.5 65.9 89.5 100
8. VL 65.4 74.2 75.4 89.5 100
9. VL 81.8 78.5 95.4 92.5 100
这是代码:
library(dplyr)
normalityVar1<-mydata %>%
group_by(NVL) %>%
summarise(statistic = shapiro.test(Var1)$statistic,
p.value = shapiro.test(Var1)$p.value)
这是输出:
NVL statistic p.value
<chr> <dbl> <dbl>
1 VL 0.9125239 0.1985486
2 NV 0.8983501 0.2101248
现在,我是否要编辑此代码,以便可以同时获得所有变量(Var2、3、4、5)的输出?我什至尝试了聚合和智能化,但是我被卡住了。
aggregate(formula = Var1 ~ NVL,
data = mydata,
FUN = function(x) {y <- shapiro.test(x); c(y$statistic, y$p.value)})
如您所见,我只能对一个变量执行此操作!我知道我已经接近了,但我再也想不通了!!在此先感谢您的帮助!
最佳答案
mydata <- read.table(text="
NVL Var1 Var2 Var3 Var4 Var5
1 NV 22.5 26.8 89.2 35.7 100
2 NV 34.7 67.4 29.8 12.4 100
3 NV 68.3 34.5 44.5 23.8 50
4 NV 11.2 55.3 17.5 77.9 100
5 VL 55.6 77.2 59.7 89.6 100
6 VL 60.5 88.7 65.4 99.6 100
7 VL 89.4 87.5 65.9 89.5 100
8 VL 65.4 74.2 75.4 89.5 90
9 VL 81.8 78.5 95.4 92.5 90
", header=T)
library(dplyr)
myfun <- function(x, group) {
data.frame(x, group) %>%
group_by(group) %>%
summarise(
statistic = ifelse(sd(x)!=0,shapiro.test(x)$statistic,NA),
p.value = ifelse(sd(x)!=0,shapiro.test(x)$p.value,NA)
)
}
(lst <- lapply(mydata[,-1], myfun, group=mydata[,1]))
输出为:
$Var1
# A tibble: 2 x 3
group statistic p.value
<fctr> <dbl> <dbl>
1 NV 0.9313476 0.6023421
2 VL 0.9149572 0.4979450
$Var2
# A tibble: 2 x 3
group statistic p.value
<fctr> <dbl> <dbl>
1 NV 0.9409576 0.6601747
2 VL 0.8736587 0.2815562
$Var3
# A tibble: 2 x 3
group statistic p.value
<fctr> <dbl> <dbl>
1 NV 0.9096322 0.4804557
2 VL 0.8644349 0.2446131
$Var4
# A tibble: 2 x 3
group statistic p.value
<fctr> <dbl> <dbl>
1 NV 0.9003135 0.43261822
2 VL 0.7260939 0.01760713
$Var5
# A tibble: 2 x 3
group statistic p.value
<fctr> <dbl> <dbl>
1 NV 0.6297763 0.001240726
2 VL 0.6840289 0.006470001
lst
输出列表可以转换为data.frame
对象:do.call(cbind, lst)[,-seq(4,3*(ncol(mydata)-1),3)]
这是输出:
Var1.group Var1.statistic Var1.p.value Var2.statistic Var2.p.value Var3.statistic Var3.p.value Var4.statistic Var4.p.value Var5.statistic Var5.p.value
1 NV 0.9313476 0.6023421 0.9409576 0.6601747 0.9096322 0.4804557 0.9003135 0.43261822 0.6297763 0.001240726
2 VL 0.9149572 0.4979450 0.8736587 0.2815562 0.8644349 0.2446131 0.7260939 0.01760713 0.6840289 0.006470001