Bochkareva Olga O, Dranenko Natalia O, Ocheredko Elena S, Kanevsky German M, Lozinsky Yaroslav N, Khalaycheva Vera A, Artamonova Irena I, Gelfand Mikhail S
Kharkevich Institute for Information Transmission Problems, Moscow, Russia.
Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, Russia.
PeerJ. 2018 Mar 27;6:e4545. doi: 10.7717/peerj.4545. eCollection 2018.
Genome rearrangements have played an important role in the evolution of from its progenitor . Traditional phylogenetic trees for based on sequence comparison have short internal branches and low bootstrap supports as only a small number of nucleotide substitutions have occurred. On the other hand, even a small number of genome rearrangements may resolve topological ambiguities in a phylogenetic tree. We reconstructed phylogenetic trees based on genome rearrangements using several popular approaches such as Maximum likelihood for Gene Order and the Bayesian model of genome rearrangements by inversions. We also reconciled phylogenetic trees for each of the three CRISPR loci to obtain an integrated scenario of the CRISPR cassette evolution. Analysis of contradictions between the obtained evolutionary trees yielded numerous parallel inversions and gain/loss events. Our data indicate that an integrated analysis of sequence-based and inversion-based trees enhances the resolution of phylogenetic reconstruction. In contrast, reconstructions of strain relationships based on solely CRISPR loci may not be reliable, as the history is obscured by large deletions, obliterating the order of spacer gains. Similarly, numerous parallel gene losses preclude reconstruction of phylogeny based on gene content.
基因组重排在[物种名称]从其祖先[祖先物种名称]进化的过程中发挥了重要作用。基于序列比较构建的传统[物种名称]系统发育树内部分支较短且自展支持率较低,因为只发生了少量核苷酸替换。另一方面,即使少量的基因组重排也可能解决系统发育树中的拓扑歧义。我们使用几种常用方法基于基因组重排重建系统发育树,例如基因顺序的最大似然法和通过倒位的基因组重排贝叶斯模型。我们还对三个CRISPR位点中的每一个的系统发育树进行了整合,以获得CRISPR盒进化的综合情况。对所得进化树之间矛盾的分析产生了大量平行倒位和获得/丢失事件。我们的数据表明,基于序列和基于倒位的树的综合分析提高了系统发育重建的分辨率。相比之下,仅基于CRISPR位点重建菌株关系可能不可靠,因为历史被大量缺失所掩盖,抹去了间隔序列获得的顺序。同样,大量平行基因丢失妨碍了基于基因内容的系统发育重建。