Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road, Unit 2155, Storrs, CT 06269, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):182-93. doi: 10.1109/TCBB.2009.27.
Large amount of population-scale genetic variation data are being collected in populations. One potentially important biological problem is to infer the population genealogical history from these genetic variation data. Partly due to recombination, genealogical history of a set of DNA sequences in a population usually cannot be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence of recombination, which extend and improve the previous work in. We first show that the "tree scan" method can be converted to a probabilistic inference method based on a hidden Markov model. We then focus on developing a novel local tree inference method called RENT that is both accurate and scalable to larger data. Through simulation, we demonstrate the usefulness of our methods by showing that the hidden-Markov-model-based method is comparable with the original method in terms of accuracy. We also show that RENT is competitive with other methods in terms of inference accuracy, and its inference error rate is often lower and can handle large data.
大量的人群遗传变异数据正在被收集。一个潜在的重要生物学问题是从这些遗传变异数据中推断出群体的系统发生历史。部分由于重组,一个群体中一组 DNA 序列的系统发生历史通常不能用单个树来表示。相反,系统发生关系最好用系统发生网络来表示,这是一组相关局部系统发生树的紧凑表示,每个局部系统发生树代表基因组的一个短区域,并且可能具有不同的拓扑结构。在重组存在的情况下,对一组 DNA 序列进行系统发生历史推断有许多潜在的应用,包括复杂疾病的关联映射。在本文中,我们提出了两种新的方法来重建存在重组时的局部树拓扑结构,这两种方法扩展和改进了之前在[1]中的工作。我们首先表明,“树扫描”方法可以转换为基于隐马尔可夫模型的概率推断方法。然后,我们专注于开发一种新的局部树推断方法 RENT,它既准确又可扩展到更大的数据。通过模拟,我们通过显示基于隐马尔可夫模型的方法在准确性方面与原始方法相当,证明了我们方法的有效性。我们还表明,在推断准确性方面,RENT 与其他方法具有竞争力,其推断错误率通常较低,并且可以处理大数据。