Andrés Aida M, Clark Andrew G, Shimmin Lawrence, Boerwinkle Eric, Sing Charles F, Hixson James E
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA.
Genet Epidemiol. 2007 Nov;31(7):659-71. doi: 10.1002/gepi.20185.
Statistical methods for haplotype inference from multi-site genotypes of unrelated individuals have important application in association studies and population genetics. Understanding the factors that affect the accuracy of this inference is important, but their assessment has been restricted by the limited availability of biological data with known phase. We created hybrid cell lines monosomic for human chromosome 19 and produced single-chromosome complete sequences of a 48 kb genomic region in 39 individuals of African American (AA) and European American (EA) origin. We employ these phase-known genotypes and coalescent simulations to assess the accuracy of statistical haplotype reconstruction by several algorithms. Accuracy of phase inference was considerably low in our biological data even for regions as short as 25-50 kb, suggesting that caution is needed when analyzing reconstructed haplotypes. Moreover, the reliability of estimated confidence in phase inference is not high enough to allow for a reliable incorporation of site-specific uncertainty information in subsequent analyses. We show that, in samples of certain mixed ancestry (AA and EA populations), the most accurate haplotypes are probably obtained when increasing sample size by considering the largest, pooled sample, despite the hypothetical problems associated with pooling across those heterogeneous samples. Strategies to improve confidence in reconstructed haplotypes, and realistic alternatives to the analysis of inferred haplotypes, are discussed.
从无关个体的多位点基因型推断单倍型的统计方法在关联研究和群体遗传学中具有重要应用。了解影响这种推断准确性的因素很重要,但由于已知相位的生物学数据有限,对这些因素的评估受到了限制。我们创建了人类19号染色体单体的杂交细胞系,并在39名非裔美国人(AA)和欧裔美国人(EA)个体中生成了一个48 kb基因组区域的单染色体完整序列。我们利用这些已知相位的基因型和合并模拟来评估几种算法进行统计单倍型重建的准确性。即使对于短至25 - 50 kb的区域,我们的生物学数据中相位推断的准确性也相当低,这表明在分析重建的单倍型时需要谨慎。此外,估计的相位推断置信度的可靠性不够高,无法在后续分析中可靠地纳入位点特异性不确定性信息。我们表明,在某些混合血统(AA和EA群体)的样本中,尽管跨这些异质样本合并存在假设问题,但通过考虑最大的合并样本增加样本量时,可能会获得最准确的单倍型。本文还讨论了提高对重建单倍型置信度的策略以及推断单倍型分析的现实替代方法。