Vinh Le Sy, Varón Andrés, Wheeler Ward C
Division of Invertebrate Zoology, American Museum of Natural History, USA.
Genome Inform. 2006;17(2):141-51.
The increase of available genomes poses new optimization problems in genome comparisons. A genome can be considered as a sequence of characters (loci) which are genes or segments of nucleotides. Genomes are subject to both nucleotide transformation and character order rearrangement processes. In this context, we define a problem of so-called pairwise alignment with rearrangements (PAR) between two genomes. The PAR generalizes the ordinary pairwise alignment by allowing the rearrangement of character order. The objective is to find the optimal PAR that minimizes the total cost which is composed of three factors: the edit cost between characters, the deletion/insertion cost of characters, and the rearrangement cost between character orders. To this end, we propose simple and effective heuristic methods: character moving and simultaneous character swapping. The efficiency of the methods is tested on Metazoa mitochondrial genomes. Experiments show that, pairwise alignments with rearrangements give better performance than ordinary pairwise alignments without rearrangements. The best proposed method, simultaneous character swapping, is implemented as an essential subroutine in our software POY version 4.0 to reconstruct genome-based phylogenies.
可用基因组数量的增加给基因组比较带来了新的优化问题。基因组可被视为由基因或核苷酸片段组成的字符(位点)序列。基因组会经历核苷酸转化和字符顺序重排过程。在此背景下,我们定义了两个基因组之间所谓的带重排的成对比对(PAR)问题。PAR通过允许字符顺序重排来推广普通的成对比对。目标是找到最优的PAR,使由三个因素组成的总成本最小化:字符间的编辑成本、字符的删除/插入成本以及字符顺序间的重排成本。为此,我们提出了简单有效的启发式方法:字符移动和同时字符交换。这些方法的效率在后生动物线粒体基因组上进行了测试。实验表明,带重排的成对比对比无重排的普通成对比对表现更好。所提出的最佳方法,即同时字符交换,在我们的软件POY版本4.0中作为一个基本子程序来重建基于基因组的系统发育树。