Tong Liping, Yang Jie, Cooper Richard S
Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL 60660, USA.
Ann Hum Genet. 2010 May;74(3):275-85. doi: 10.1111/j.1469-1809.2010.00574.x.
We address the asymptotic and approximate distributions of a large class of test statistics with quadratic forms used in association studies. The statistics of interest take the general form D=X(T)A X, where A is a general similarity matrix which may or may not be positive semi-definite, and X follows the multivariate normal distribution with mean mu and variance matrix Sigma, where Sigma may or may not be singular. We show that D can be written as a linear combination of independent chi(2) random variables with a shift. Furthermore, its distribution can be approximated by a chi(2) or the difference of two chi(2) distributions. In the setting of association testing, our methods are especially useful in two situations. First, when the required significance level is much smaller than 0.05 such as in a genome scan, the estimation of p-values using permutation procedures can be challenging. Second, when an EM algorithm is required to infer haplotype frequencies from un-phased genotype data, the computation can be intensive for a permutation procedure. In either situation, an efficient and accurate estimation procedure would be useful. Our method can be applied to any quadratic form statistic and therefore should be of general interest.
我们探讨了关联研究中使用的一大类具有二次型的检验统计量的渐近分布和近似分布。感兴趣的统计量具有一般形式(D = X^T A X),其中(A)是一个一般的相似性矩阵,它可能是半正定的,也可能不是,并且(X)服从均值为(\mu)、方差矩阵为(\Sigma)的多元正态分布,其中(\Sigma)可能是奇异的,也可能不是。我们表明(D)可以写成具有一个偏移量的独立卡方随机变量的线性组合。此外,它的分布可以用卡方分布或两个卡方分布的差来近似。在关联检验的背景下,我们的方法在两种情况下特别有用。第一,当所需的显著性水平远小于(0.05)时,比如在全基因组扫描中,使用置换程序估计(p)值可能具有挑战性。第二,当需要使用期望最大化(EM)算法从未分型的基因型数据推断单倍型频率时,置换程序的计算量可能很大。在这两种情况下,一种高效且准确的估计程序将很有用。我们的方法可以应用于任何二次型统计量,因此应该具有普遍的意义。