Tajima F (1989) Genetics 123:585-595

1  Average number of pairwise nucleotide differences

\( \hat k = \frac{\sum\sum_{i<j} k_{ij}}{\binom{n}{2}} \)

\( k_{ij} \), the number of nucleotide differences between the $i$-th and $j$-th DNA sequences

2  D statistic

\( D = \frac{\hat k - \frac{S}{a_1}}{\sqrt{e_1 S + e_2 S (S-1)}} \)

\( a_1 = \sum \limits_{i=1}^{n-1} \frac{1}{i} \)

\( a_2 = \sum \limits_{i=1}^{n-1} \frac{1}{i^2} \)

\( b_1 = \frac{n+1}{3(n-1)} \)

\( b_2 = \frac{2(n^2+n+3)}{9n(n-1)} \)

\( c_1 = b_1 - \frac{1}{a_1} \)

\( c_2 = b_2 - \frac{n+2}{a_1 n} + \frac{a_2}{a_1^2} \)

\( e_1 = \frac{c_1}{a_1} \)

\( e_2 = \frac{c_2}{a_1^2 + a_2} \)

\( S \), the number of segregating (or polymorphic) sites in the sample

3  计算

Tajima F (1989) Genetics 123:585-595

Carlson CS, et al. (2005) Genome Res 15:1553-1565

vcftools --vcf geno.cvf --TajimaD 100000

4  示例

一个 SNP 标记,5 个个体

A A C C C

两两个体间基因型差异

ijd
120
131
141
151
231
241
251
340
350
450

\( pi = n1*n2 / (n*(n-1)/2) = 2*n1*n2 / (n*(n-1)) = 2*2*3/(4*5) = 0.6 \)

01-07 16:52
查看更多