Lam Fumei, Langley Charles H, Song Yun S
Department of Computer Science, University of California, Davis, California, USA.
J Comput Biol. 2011 Mar;18(3):415-28. doi: 10.1089/cmb.2010.0270.
Abstract Given molecular genetic data from diploid individuals that, at present, reproduce mostly or exclusively asexually without recombination, an important problem in evolutionary biology is detecting evidence of past sexual reproduction (i.e., meiosis and mating) and recombination (both meiotic and mitotic). However, currently there is a lack of computational tools for carrying out such a study. In this article, we formulate a new problem of reconstructing diploid genealogies under the assumption of no sexual reproduction or recombination, with the ultimate goal being to devise genealogy-based tools for testing deviation from these assumptions. We first consider the infinite-sites model of mutation and develop linear-time algorithms to test the existence of an asexual diploid genealogy compatible with the infinite-sites model of mutation, and to construct one if it exists. In this ideal case, our chance of detecting signatures of past sexual reproduction is maximized. Then, we relax the infinite-sites assumption and develop an integer linear programming formulation to reconstruct asexual diploid genealogies with the minimum number of homoplasy (back or recurrent mutation) events. If this number is substantially larger than that expected for typical asexual organisms, then it may suggest that sexual reproduction or recombination may have played an important role in the evolutionary history. We apply our algorithms on simulated data sets with sizes of biological interest.
摘要 鉴于目前从主要或完全进行无性繁殖且无重组的二倍体个体获得的分子遗传数据,进化生物学中的一个重要问题是检测过去有性繁殖(即减数分裂和交配)及重组(减数分裂和有丝分裂)的证据。然而,目前缺乏用于开展此类研究的计算工具。在本文中,我们提出了一个在无性繁殖或重组假设下重建二倍体谱系的新问题,最终目标是设计基于谱系的工具来测试与这些假设的偏差。我们首先考虑无限位点突变模型,并开发线性时间算法来测试与无限位点突变模型兼容的无性二倍体谱系的存在性,若存在则构建该谱系。在这种理想情况下,我们检测过去有性繁殖特征的机会最大化。然后,我们放宽无限位点假设,开发一种整数线性规划公式来重建具有最少同塑性(反向或反复突变)事件的无性二倍体谱系。如果这个数量显著大于典型无性生物预期的数量,那么这可能表明有性繁殖或重组可能在进化历史中发挥了重要作用。我们将我们的算法应用于具有生物学意义大小的模拟数据集。