Yang Yaning, Zhang Jingshan, Hoh Josephine, Matsuda Fumihiko, Xu Peng, Lathrop Mark, Ott Jurg
Laboratory of Statistical Genetics, The Rockefeller University, New York, NY 10021, USA.
Proc Natl Acad Sci U S A. 2003 Jun 10;100(12):7225-30. doi: 10.1073/pnas.1237858100. Epub 2003 May 30.
The efficiency of single-nucleotide polymorphism haplotype analysis may be increased by DNA pooling, which can dramatically reduce the number of genotyping assays. We develop a method for obtaining maximum likelihood estimates of haplotype frequencies for different pool sizes, assess the accuracy of these estimates, and show that pooling DNA samples is efficient in estimating haplotype frequencies. Although pooling K individuals increases ambiguities, at least for small pool size K and small numbers of loci, the uncertainty of estimation increases <K times that of unpooled DNA. We also develop the asymptotic variance-covariance of maximum likelihood estimates and evaluate the accuracy of variance estimates by Monte Carlo methods. When the sample size of pools is moderately large, the asymptotic variance estimates are rather accurate. Completely or partially missing genotyping information is allowed for in our analysis. Finally, our methods are applied to single-nucleotide polymorphisms in the angiotensinogen gene.
通过DNA混合可以提高单核苷酸多态性单倍型分析的效率,这可以显著减少基因分型检测的数量。我们开发了一种方法,用于获得不同混合样本量下单倍型频率的最大似然估计,评估这些估计的准确性,并表明混合DNA样本在估计单倍型频率方面是有效的。尽管混合K个个体增加了模糊性,至少对于小的混合样本量K和少量基因座来说是这样,但估计的不确定性增加幅度小于未混合DNA的K倍。我们还推导了最大似然估计的渐近方差-协方差,并通过蒙特卡罗方法评估方差估计的准确性。当混合样本的样本量适中较大时,渐近方差估计相当准确。我们的分析允许完全或部分缺失基因分型信息。最后,我们的方法应用于血管紧张素原基因中的单核苷酸多态性。