Joint Carnegie Mellon University/University of Pittsburgh PhD Program in Computational Biology and Lane Center for Computational Biology, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):918-28. doi: 10.1109/TCBB.2011.23.
The random accumulation of variations in the human genome over time implicitly encodes a history of how human populations have arisen, dispersed, and intermixed since we emerged as a species. Reconstructing that history is a challenging computational and statistical problem but has important applications both to basic research and to the discovery of genotype-phenotype correlations. We present a novel approach to inferring human evolutionary history from genetic variation data. We use the idea of consensus trees, a technique generally used to reconcile species trees from divergent gene trees, adapting it to the problem of finding robust relationships within a set of intraspecies phylogenies derived from local regions of the genome. Validation on both simulated and real data shows the method to be effective in recapitulating known true structure of the data closely matching our best current understanding of human evolutionary history. Additional comparison with results of leading methods for the problem of population substructure assignment verifies that our method provides comparable accuracy in identifying meaningful population subgroups in addition to inferring relationships among them. The consensus tree approach thus provides a promising new model for the robust inference of substructure and ancestry from large-scale genetic variation data.
人类基因组随时间随机积累的变异隐含地编码了自我们作为一个物种出现以来人类群体是如何出现、分散和混合的历史。重建这段历史是一个具有挑战性的计算和统计问题,但对基础研究和基因型-表型相关性的发现都有重要的应用。我们提出了一种从遗传变异数据推断人类进化历史的新方法。我们使用共识树的概念,这是一种通常用于协调来自不同基因树的物种树的技术,将其适用于从基因组局部区域得出的一组种内系统发育树中寻找稳健关系的问题。对模拟和真实数据的验证表明,该方法在准确再现数据的已知真实结构方面非常有效,与我们目前对人类进化历史的最佳理解非常吻合。与用于群体亚结构分配问题的领先方法的结果进行的额外比较验证了,除了推断它们之间的关系之外,我们的方法在识别有意义的群体亚群方面提供了相当的准确性。因此,共识树方法为从大规模遗传变异数据中稳健推断亚结构和祖先提供了一种有前途的新模型。