da Silva Poly H, Machado Raphael, Dantas Simone, Braga Marília Dv
IME, Universidade Federal Fluminense, Niterói, Brazil.
Algorithms Mol Biol. 2013 Jul 23;8(1):21. doi: 10.1186/1748-7188-8-21.
Classical approaches to compute the genomic distance are usually limited to genomes with the same content and take into consideration only rearrangements that change the organization of the genome (i.e. positions and orientation of pieces of DNA, number and type of chromosomes, etc.), such as inversions, translocations, fusions and fissions. These operations are generically represented by the double-cut and join (DCJ) operation. The distance between two genomes, in terms of number of DCJ operations, can be computed in linear time. In order to handle genomes with distinct contents, also insertions and deletions of fragments of DNA - named indels - must be allowed. More powerful than an indel is a substitution of a fragment of DNA by another fragment of DNA. Indels and substitutions are called content-modifying operations. It has been shown that both the DCJ-indel and the DCJ-substitution distances can also be computed in linear time, assuming that the same cost is assigned to any DCJ or content-modifying operation.
In the present study we extend the DCJ-indel and the DCJ-substitution models, considering that the content-modifying cost is distinct from and upper bounded by the DCJ cost, and show that the distance in both models can still be computed in linear time. Although the triangular inequality can be disrupted in both models, we also show how to efficiently fix this problem a posteriori.
计算基因组距离的经典方法通常仅限于具有相同内容的基因组,并且仅考虑改变基因组组织的重排(即DNA片段的位置和方向、染色体的数量和类型等),例如倒位、易位、融合和裂变。这些操作通常由双切割和连接(DCJ)操作表示。就DCJ操作的数量而言,两个基因组之间的距离可以在线性时间内计算出来。为了处理具有不同内容的基因组,还必须允许DNA片段的插入和缺失——称为插入缺失(indel)。比插入缺失更强大的是用另一个DNA片段替换一个DNA片段。插入缺失和替换被称为内容修改操作。已经表明,假设对任何DCJ或内容修改操作赋予相同的成本,DCJ-插入缺失距离和DCJ-替换距离也可以在线性时间内计算出来。
在本研究中,我们扩展了DCJ-插入缺失和DCJ-替换模型,考虑到内容修改成本与DCJ成本不同且有上限,并表明两个模型中的距离仍然可以在线性时间内计算出来。尽管在两个模型中三角不等式可能会被打破,但我们也展示了如何事后有效地解决这个问题。