International Laboratory "Computer Technologies", ITMO University, Saint Petersburg, Russia.
Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
J Math Biol. 2023 Jul 10;87(2):25. doi: 10.1007/s00285-023-01955-z.
Genome rearrangements are evolutionary events that shuffle genomic architectures. The number of genome rearrangements that happened between two genomes is often used as the evolutionary distance between these species. This number is often estimated as the minimum number of genome rearrangements required to transform one genome into another which are only reliable for closely-related genomes. These estimations often underestimate the evolutionary distance for genomes that have substantially evolved from each other, and advanced statistical methods can be used to improve accuracy. Several statistical estimators have been developed, under various evolutionary models, of which the most complete one, INFER, takes into account different degrees of genome fragility. We present TruEst-an efficient tool that estimates the evolutionary distance between the genomes under the INFER model of genome rearrangements. We apply our method to both simulated and real data. It shows high accuracy on the simulated data. On the real datasets of mammal genomes the method found several pairs of genomes for which the estimated distances are in high consistency with the previous ancestral reconstruction studies.
基因组重排是改变基因组结构的进化事件。两个基因组之间发生的基因组重排数量通常被用作这些物种之间的进化距离。这个数量通常被估计为将一个基因组转换为另一个基因组所需的最小基因组重排数量,而这种估计对于密切相关的基因组是可靠的。对于彼此有很大进化差异的基因组,这些估计往往低估了进化距离,并且可以使用高级统计方法来提高准确性。已经开发了几种在不同进化模型下的统计估计器,其中最完整的一个是 INFER,它考虑了不同程度的基因组脆弱性。我们提出了 TruEst,这是一种在 INFER 基因组重排模型下估计基因组之间进化距离的有效工具。我们将我们的方法应用于模拟和真实数据。它在模拟数据上表现出很高的准确性。在哺乳动物基因组的真实数据集上,该方法发现了几对基因组,它们的估计距离与之前的祖先重建研究高度一致。