Lyubetsky Vassily, Gershgorin Roman, Seliverstov Alexander, Gorbunov Konstantin
Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Bolshoi Karetnyi lane, 19, 127051, Moscow, Russia.
BMC Bioinformatics. 2016 Jan 19;17:40. doi: 10.1186/s12859-016-0878-z.
One of the main aims of phylogenomics is the reconstruction of objects defined in the leaves along the whole phylogenetic tree to minimize the specified functional, which may also include the phylogenetic tree generation. Such objects can include nucleotide and amino acid sequences, chromosomal structures, etc. The structures can have any set of linear and circular chromosomes, variable gene composition and include any number of paralogs, as well as any weights of individual evolutionary operations to transform a chromosome structure. Many heuristic algorithms were proposed for this purpose, but there are just a few exact algorithms with low (linear, cubic or similar) polynomial computational complexity among them to our knowledge. The algorithms naturally start from the calculation of both the distance between two structures and the shortest sequence of operations transforming one structure into another. Such calculation per se is an NP-hard problem.
A general model of chromosomal structure rearrangements is considered. Exact algorithms with almost linear or cubic polynomial complexities have been developed to solve the problems for the case of any chromosomal structure but with certain limitations on operation weights. The computer programs are tested on biological data for the problem of mitochondrial or plastid chromosomal structure reconstruction. To our knowledge, no computer programs are available for this model.
Exactness of the proposed algorithms and such low polynomial complexities were proved. The reconstructed evolutionary trees of mitochondrial and plastid chromosomal structures as well as the ancestral states of the structures appear to be reasonable.
系统发育基因组学的主要目标之一是沿着整个系统发育树重建叶节点中定义的对象,以最小化特定功能,这可能还包括系统发育树的生成。此类对象可以包括核苷酸和氨基酸序列、染色体结构等。这些结构可以具有任何线性和环状染色体组合、可变的基因组成,包括任意数量的旁系同源物,以及用于转换染色体结构的任何个体进化操作权重。为此目的提出了许多启发式算法,但据我们所知,其中只有少数具有低(线性、立方或类似)多项式计算复杂度的精确算法。这些算法自然地从计算两个结构之间的距离以及将一个结构转换为另一个结构的最短操作序列开始。这种计算本身就是一个NP难问题。
考虑了染色体结构重排的一般模型。已经开发出具有几乎线性或立方多项式复杂度的精确算法,以解决任何染色体结构情况下的问题,但对操作权重有一定限制。针对线粒体或质体染色体结构重建问题对计算机程序进行了生物学数据测试。据我们所知,没有适用于此模型的计算机程序。
所提出算法的精确性以及如此低的多项式复杂度得到了证明。重建的线粒体和质体染色体结构进化树以及结构的祖先状态似乎是合理的。