Ref:https://onlinecourses.science.psu.edu/stat464/print/book/export/html/14


估计CDF

The Empirical CDF

Applied Nonparametric Statistics-lec10-LMLPHP

绘制empirical cdf的图像:

x = c(4, 0, 3, 2, 2)
plot.ecdf(x)

Applied Nonparametric Statistics-lec10-LMLPHP

Kolmogorov-Smirnov test

testing the "sameness" of two independent samples from a continuous distribution

大的p-value可以说明不同,但小的p-value不能说明相同

样本数量较小时,p-value可能偏大

> x = c(4, 0, 3, 2, 2)
> plot.ecdf(x)
> plot(ecdf(x))
> ecdf(x)
Empirical CDF
Call: ecdf(x)
x[1:4] = 0, 2, 3, 4
> ks.test(x, y="pnorm", mean(x), sd(x)) One-sample Kolmogorov-Smirnov test data: x
D = 0.24637, p-value = 0.9219
alternative hypothesis: two-sided Warning message:
In ks.test(x, y = "pnorm", mean(x), sd(x)) :
Kolmogorov - Smirnov检验里不应该有连结

Ps:

在R中,与正态分布相关的有四个函数。dnorm是pdf,pnorm是cdf,qnorm是the inverse cumulative density function (quantiles)

rnorm是randomly generated numbers

关于qnorm,它给定一个概率,返回cdf对应的值。如果使用标准正态分布的,那么给定一个概率,返回的就是Z-score

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)

Density Estimation  

> x
[1] 4 0 3 2 2
> density(x) Call:
density.default(x = x) Data: x (5 obs.); Bandwidth 'bw' = 0.4868 x y
Min. :-1.4604 Min. :0.001837
1st Qu.: 0.2698 1st Qu.:0.059033
Median : 2.0000 Median :0.141129
Mean : 2.0000 Mean :0.144277
3rd Qu.: 3.7302 3rd Qu.:0.205314
Max. : 5.4604 Max. :0.351014
> plot(density(x))

Applied Nonparametric Statistics-lec10-LMLPHPApplied Nonparametric Statistics-lec10-LMLPHPApplied Nonparametric Statistics-lec10-LMLPHP

如果在density(x)里面加上bandwidth参数,那么图片会发生变化,如上图所示。

  

  

04-28 23:43