Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia.
J Bioinform Comput Biol. 2021 Dec;19(6):2140015. doi: 10.1142/S0219720021400151. Epub 2021 Nov 19.
Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The "position paradigm", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of , each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with "practical" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.
在通过基因组重排模型计算进化距离的众多现代方法中,大多数方法都与特定的基因组建模假设集和受限的允许重排类相关。“位置范式”将基因组表示为表示每个区域位置(和方向)的排列,从而实现了一种精细的基于模型的方法,其中可以选择具有生物学合理性的重排,并为其分配相对概率/成本。在这里,必须将基因组的任何潜在结构对称性进一步纳入计算中,并确保模型反映了这种对称性。在我们最近引入的框架中,每个基因组对应于一个元素,该元素同时包含其所有内在的物理对称性。这些代数的表示论然后通过重排作为马尔可夫链提供了进化的自然模型。虽然目前计算上不可行,但我们认为将该框架用于计算具有“实际”区域数量的基因组的距离是一个重大的理论进展:可以结合不同的基因组建模假设,计算各种基因组距离,并在不同的重排模型下比较结果。本文的目的是展示其中的一些特征。