本文介绍了Shapiro Wilk 测试在 R markdown 中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一段 R 降价代码.我只是在数据上应用 Shapiro-Wilk 测试.当我尝试以通常的方式在 R studio 中运行代码时,我没有遇到任何问题.但是当我尝试在 r markdown 中运行代码时,出现以下错误:

shapiro.test(Metric) 中的错误:样本大小必须介于 3 和 5000 之间调用:,匿名>...总结->summarise.tbl_df ->summarise_impl_>shapiro.test 另外:有32个警告(使用warnings()查看)

警告是:

代码是:

normality_test_on_data_PPM <- final_combined_data %>%group_by(PPM) %>%总结(W = shapiro.test(Metric)$statistic, P.value = shapiro.test(Metric)$p.value) %>%取消分组()%>%变异(P_Value = 格式(round(P.value,3),nsmall = 3))%>%选择(PPM,P_Value)%>%变异(Normal_test = ifelse(P_Value >= 0.05,正常",不正常"))

正态性检查结果

DT::datatable(normality_test_on_data_PPM)
解决方案

Shapiro Wilk 检验只承认样本量

I am working on a piece of R markdown code. I am simply applying a Shapiro-Wilk test on a data. When I try to run the code in R studio in usual way, I don't get any issue. But when I try to run the code in r markdown, I am getting error provided below:

Warnings are:

Code is:


normality_test_on_data_PPM <- final_combined_data %>%
                              group_by(PPM) %>%
                              summarise(W = shapiro.test(Metric)$statistic, P.value = shapiro.test(Metric)$p.value) %>%
                              ungroup() %>%
                              mutate(P_Value = format(round(P.value,3), nsmall = 3)) %>%
                              select(PPM , P_Value) %>%
                              mutate(Normal_test = ifelse(P_Value >= 0.05, "Normal", "Not Normal"))

DT::datatable(normality_test_on_data_PPM)
解决方案

The Shapiro Wilk test admits only sample sizes <= 5000--for good reason, as in very large samples, even minute deviations from normality will qualify as significant at conventional levels. See the discussion here: https://stats.stackexchange.com/questions/446262/can-a-sample-larger-than-5-000-data-points-be-tested-for-normality-using-shapiro. Alternatively, use the Kolmogorov-Smirnov test ks.test, which has no such restriction or, perhaps even better, draw quantile-quantile plots, aka Q-Q plots, by using qqnorm and qqline: if the Q-Q plot deviates from the straight quantile line that's a good diagnostic indicating that the data violate normality.

EDIT:Consider this illustration:

v1 <- rnorm(500)
v2 <- exp(rnorm(500))

par(mfrow = c(1,2), xpd = F)
qqnorm(v1, main = "Q-Q plot", cex.main = 0.85)
qqline(v1, col = "blue")
qqnorm(v2, main = "Q-Q plot", cex.main = 0.85)
qqline(v2, col = "blue")

The resulting plots clearly show which variable is normally, which is not normally distributed:

这篇关于Shapiro Wilk 测试在 R markdown 中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 05:06