Suppr超能文献

带有插入和缺失的双切割与连接。

Double cut and join with insertions and deletions.

作者信息

Braga Marília D V, Willing Eyla, Stoye Jens

机构信息

Technische Fakultät, Universität Bielefeld, Bielefeld, Germany.

出版信息

J Comput Biol. 2011 Sep;18(9):1167-84. doi: 10.1089/cmb.2011.0118.

Abstract

Many approaches to compute the genomic distance are still limited to genomes with the same content, without duplicated markers. However, differences in the gene content are frequently observed and can reflect important evolutionary aspects. While duplicated markers can hardly be handled by exact models, when duplicated markers are not allowed, a few polynomial time algorithms that include genome rearrangements, insertions and deletions were already proposed. In an attempt to improve these results, in the present work we give the first linear time algorithm to compute the distance between two multichromosomal genomes with unequal content, but without duplicated markers, considering insertions, deletions and double cut and join (DCJ) operations. We derive from this approach algorithms to sort one genome into another one also using DCJ operations, insertions and deletions. The optimal sorting scenarios can have different compositions and we compare two types of sorting scenarios: one that maximizes and one that minimizes the number of DCJ operations with respect to the number of insertions and deletions. We also show that, although the triangle inequality can be disrupted in the proposed genomic distance, it is possible to correct this problem adopting a surcharge on the number of non-common markers. We use our method to analyze six species of Rickettsia, a group of obligate intracellular parasites, and identify preliminary evidence of clusters of deletions.

摘要

许多计算基因组距离的方法仍然局限于具有相同内容且无重复标记的基因组。然而,基因内容的差异经常被观察到,并且可以反映重要的进化方面。虽然精确模型很难处理重复标记,但当不允许重复标记时,已经提出了一些包括基因组重排、插入和缺失的多项式时间算法。为了改进这些结果,在本工作中,我们给出了第一个线性时间算法,用于计算两个内容不等但无重复标记的多染色体基因组之间的距离,同时考虑插入、缺失和双切割与连接(DCJ)操作。我们从这种方法推导出算法,也使用DCJ操作、插入和缺失将一个基因组排序为另一个基因组。最优排序方案可以有不同的组成,我们比较了两种类型的排序方案:一种是相对于插入和缺失的数量最大化DCJ操作的数量,另一种是最小化DCJ操作的数量。我们还表明,虽然在所提出的基因组距离中三角不等式可能会被打破,但通过对非共同标记的数量收取附加费可以纠正这个问题。我们使用我们的方法分析了立克次氏体属的六个物种,这是一组专性细胞内寄生虫,并确定了缺失簇的初步证据。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验