Browning Sharon R, Browning Brian L
Department of Statistics, The University of Auckland, Auckland, New Zealand.
Am J Hum Genet. 2007 Nov;81(5):1084-97. doi: 10.1086/521987. Epub 2007 Sep 21.
Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.
由于全基因组关联研究获取的数据量巨大,因此带来了许多新的统计和计算挑战。其中一个挑战是单倍型推断;为候选基因研究中的小数据集设计的单倍型推断方法,对于全基因组关联研究中大量个体的基因分型数据而言,扩展性不佳。我们提出了一种用于推断单倍型相位和缺失数据的新方法及软件,该方法能够准确地对全基因组关联研究中的数据进行相位分析,并且我们首次对具有数千个基因分型个体的真实和模拟数据集的单倍型推断方法进行了比较。我们发现,对于具有数千个个体且遗传标记密集分布的大数据集,我们的方法在速度和准确性方面均优于现有方法,并且我们使用该方法在3.1天的计算时间内对一个包含3002个个体、490032个标记的真实数据集进行了相位分析,其中99%的缺失等位基因被正确估算。我们的方法在Beagle软件包中实现,该软件包可免费获取。