Wilcoxon-Mann-Whitney rank sum test

Wilcoxon-Mann-Whitney ranksum test

无节点状况，假定为样本服从类似形状，如果不是类似形状的话，秩的比较没有过多意义。

这里补充一下：对于含有秩参与的非参数检验，如果形状不类似（比如某个分布过偏或者直接他们的散点图差异很大）、以及分布的密集程度不是很相符合。秩检验对位置参数的确认很不准确。

X有m个数，Y有n个数

$H_0:\mu_1=\mu_2 \qquad H_1:\mu_1\neq\mu_2$

define: $R_i=\#(X_j<Y_i,j \in I_m) + \# (Y_k < Y_i, k\in I_n)$ 为示性函数，表示小于$Y_i$混合数据的时记为1。

$\qquad \qquad W_Y = \sum_{i=1}^{n} R_i = \#(X_j<Y_i,j \in I_m) + \frac{(n)(n+1)}{2}$

\[\qquad \qquad W_{xy} = \sum R_i =\# (X_i < Y_j,i \in I_m j \in I_n)
\]

我们得到$W_{xy} = W_y - \frac{(n)(n+1)}{2}$ 同样得到 $W_{yx}$

总和 $W_x + W_y = \frac{(m+n)(m+n+1)}{2}$

所以$W_{xy}+ W_{yx} = mn $ 这两个量成为Mann-Whitney 统计量

又由于原假设下，他们同分布，不独立。

我们根据其分步满足的规律，这里举一例：

$P(R_i=k,R_j = l)= \frac{1}{(m+n)(m+n-1)}, k \neq l$

得到

$E(W_y)=\frac{n(n+m+1)}{2} \qquad Var(W_x)= \frac{(mn)(m+n+1)}{2}$

$E(W_{xy})=\frac{mn}{2} \qquad\qquad Var(W_{xy}) = \frac{(mn)(m+n+1)}{2}$

使用时，只需要计算$W_y 和 W_x $ 并计算出相应的$W_{xy} 或者 W_{yx} $ 来和表判断。

R代码解释:

wilcox.test(x,...)

x       numeric vector

y       optional numeric vector

alternative        default: two.sided  optinal choice:"greater" or "less"

paired      logic TRUE 进行的配对样本检测，此时参数mu = 1为 x-y = 1 的配对样本检测，所以要求两组数据的长度一致。

            FALSE 时，进行的时Mann-Whitney 检验。

mu  paired test 当 paired 为TRUE时，已说明。 当paired为FALSE时，可以规定M-W检验x-y 的location parameter.

exact	a logical indicating whether an exact p-value should be computed.

correct    	a logical indicating whether to apply continuity correction in the normal approximation for the p-value.// 是否进行正太校正

conf.int	a logical indicating whether a confidence interval should be computed.

conf.level	 confidence level of the interval.

其余参数暂时不会用到

例子：

两组饲料，一组高蛋白，一组低蛋白，分别饲养老鼠，老鼠增加的体重/g

weight.high <- c(134,146,104,119,124,161,107,83,113,129,97,123)

weight.low <- c(70,118,101,85,112,132,94)

wilcox.test(weight.high,weight.low)

	Wilcoxon rank sum test

结果：

data:  weight.high and weight.low

W = 62, p-value = 0.1003

alternative hypothesis: true location shift is not equal to 0

这和下面的参数效果相同

wilcox.test(weight.high,weight.low,mu=0,paired = FALSE, exact = TRUE)

	Wilcoxon rank sum test

data:  weight.high and weight.low

W = 62, p-value = 0.1003

alternative hypothesis: true location shift is not equal to 0

这是信息最全的检测,参数mu 和默认为TRUE的可以不写，只写conf.int 和 conf.level

wilcox.test(weight.high,weight.low,mu=0,paired = FALSE, exact = TRUE, correct = TRUE, conf.int = TRUE,conf.level = 0.95)

	Wilcoxon rank sum test

data:  weight.high and weight.low

W = 62, p-value = 0.1003

alternative hypothesis: true location shift is not equal to 0

95 percent confidence interval:

 -5 40

sample estimates:

difference in location

                  17.5

p-value=0.1003 ，所以肯定拒绝原假设x-y=0了。

我们可以在检验前，先画出散点图

plot(weight.high,c(1:12))

plot(weight.low,c(1:7))

数据量太小，看出来差别确实不大。。。散点图很乱。

下一次更新，关于bootstrap方法的R代码。