ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.
BMC Genomics. 2018 May 9;19(Suppl 2):96. doi: 10.1186/s12864-018-4466-7.
Genomes rearrangements carry valuable information for phylogenetic inference or the elucidation of molecular mechanisms of adaptation. However, the detection of genome rearrangements is often hampered by current deficiencies in data and methods: Genomes obtained from short sequence reads have generally very fragmented assemblies, and comparing multiple gene orders generally leads to computationally intractable algorithmic questions.
We present a computational method, ADSEQ, which, by combining ancestral gene order reconstruction, comparative scaffolding and de novo scaffolding methods, overcomes these two caveats. ADSEQ provides simultaneously improved assemblies and ancestral genomes, with statistical supports on all local features. Compared to previous comparative methods, it runs in polynomial time, it samples solutions in a probabilistic space, and it can handle a significantly larger gene complement from the considered extant genomes, with complex histories including gene duplications and losses. We use ADSEQ to provide improved assemblies and a genome history made of duplications, losses, gene translocations, rearrangements, of 18 complete Anopheles genomes, including several important malaria vectors. We also provide additional support for a differentiated mode of evolution of the sex chromosome and of the autosomes in these mosquito genomes.
We demonstrate the method's ability to improve extant assemblies accurately through a procedure simulating realistic assembly fragmentation. We study a debated issue regarding the phylogeny of the Gambiae complex group of Anopheles genomes in the light of the evolution of chromosomal rearrangements, suggesting that the phylogenetic signal they carry can differ from the phylogenetic signal carried by gene sequences, more prone to introgression.
基因组重排携带了对系统发育推断或适应分子机制阐明有价值的信息。然而,当前数据和方法的不足常常阻碍了基因组重排的检测:从短序列读取中获得的基因组通常具有非常碎片化的组装,并且比较多个基因顺序通常会导致计算上难以处理的算法问题。
我们提出了一种计算方法 ADSEQ,它通过结合祖先基因顺序重建、比较支架和从头支架方法,克服了这两个缺点。ADSEQ 同时提供了改进的组装和祖先基因组,并对所有局部特征都提供了统计支持。与以前的比较方法相比,它运行时间为多项式,在概率空间中采样解决方案,并且可以处理来自考虑的现存基因组的、具有复杂历史(包括基因重复和丢失)的、显著更大的基因补体。我们使用 ADSEQ 为 18 个完整的按蚊基因组提供了改进的组装和由重复、丢失、基因转位、重排组成的基因组历史,其中包括几个重要的疟疾媒介。我们还为这些蚊子基因组中的性染色体和常染色体的分化进化模式提供了额外的支持。
我们通过模拟现实组装碎片化的过程,展示了该方法通过准确改进现存组装的能力。我们根据染色体重排的进化研究了关于按蚊 Gambiae 复合体基因组系统发育的一个有争议的问题,表明它们携带的系统发育信号可能与基因序列携带的系统发育信号不同,更容易发生基因渗入。