画曼哈顿图和QQ plot 首推R包“qqman”,简约方便。下面具体介绍以下。
一、画曼哈顿图
install.packages("qqman") library(qqman)
1、准备包含SNP, CHR, BP, P的文件gwasResults(如果没有zscore可以不用管),如下所示:
2、上代码,如下所示:
manhattan(gwasResults)
如果觉得不够美观,考虑添加一下参数:
manhattan(gwasResults, main = "Manhattan Plot", ylim = c(0, 10), cex = 0.6, cex.axis = 0.9, col = c("blue4", "orange3"), suggestiveline = F, genomewideline = 6, chrlabs = c(1:20, "P", "Q"))
二、画 QQ plot 图
直接上代码:
qq(gwasResults$P)
同样的,还可以修改参数,美观一下:
qq(gwasResults$P, main = "Q-Q plot of GWAS p-values", xlim = c(0, 7), ylim = c(0, 12), pch = 18, col = "blue4", cex = 1.5, las = 1)
三、计算膨胀系数(Calculating Genomic Inflation Factor)
如果数据类型为Z值,则膨胀系数为:
z = gwasResults$zscore
lambda = round(median(z^2) / 0.454, 3)
如果数据类型是P值,则膨胀系数为:
p_value=gwasResults$P z = qnorm(p_value/ 2) lambda = round(median(z^2, na.rm = TRUE) / 0.454, 3)
如果数据类型是CHISQ值,则膨胀系数为:
z = gwasResults$CHISQ lambda = round(median(z^2, na.rm = TRUE) / 0.454, 3)
#关于0.454的由来:
#qchisq(0.5, 1)
#[1] 0.4549364
膨胀系数lambda的解读:
基因组膨胀因子λ定义为经验观察到的检验统计分布与预期中位数的中值之比,从而量化了因大量膨胀而造成结果的假阳性率。换句话说,λ定义为得到的卡方检验统计量的中值除以卡方分布的预期中值。预期的P值膨胀系数为1,当实际膨胀系数越偏离1,说明存在群体分层的现象越严重,容易有假阳性结果,需要重新矫正群体分层。
参考链接:
https://www.biostars.org/p/43328/
https://cran.r-project.org/web/packages/qqman/vignettes/qqman.html
https://en.wikipedia.org/wiki/Population_stratification
chrome-extension://cdonnmffkdaoajfknoeeecmchibpmkmg/static/pdf/web/viewer.html?file=https%3A%2F%2Fpersonal.broadinstitute.org%2Fchrisnc%2Fothers%2FGWAS_Smith_MethMolBiol09.pdf
chrome-extension://cdonnmffkdaoajfknoeeecmchibpmkmg/static/pdf/web/viewer.html?file=http%3A%2F%2Fwww3.stat.sinica.edu.tw%2Fsisg2015%2Fdownload%2Fhandout%2FS1-notes.pdf