Suppr超能文献

广义基因组秩距离与插入缺失

Generalizations of the genomic rank distance to indels.

机构信息

Institute of Computing, University of Campinas, Campinas, Brazil.

MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College, London, UK.

出版信息

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad087.

Abstract

MOTIVATION

The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications.

RESULTS

We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree.

AVAILABILITY AND IMPLEMENTATION

Code and instructions are available at https://github.com/meidanis-lab/rank-indel.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

等级距离模型将多染色体基因组中的基因组重排表示为矩阵运算,这允许通过重排重建进化的简约历史。我们通过允许具有不同基因内容的基因组来寻求推广此模型,以适应更广泛的生物学背景。我们通过使用基因组的矩阵表示来实现这种推广。这导致了具有不同基因内容的基因组的简单距离公式和排序算法,但没有重复。

结果

我们以两种不同的方式将等级距离推广到具有不同基因内容的基因组。第一种方法是向基本操作中添加插入、缺失和单个末端的替换。我们展示了如何有效地计算这个距离。为了避免具有不完整标记的基因组,我们的替代距离,即等级插入缺失距离,仅使用整个染色体的插入和缺失。我们使用我们的距离和 DCJ-插入缺失距离构建模拟数据和真实原核基因组的系统发育树,并将它们与参考树进行比较。对于模拟数据,我们的距离使用四分体度量作为基线,优于 DCJ-插入缺失距离。这表明等级距离在比较远缘物种时更稳健。对于真实的原核基因组,所有基于重排的距离都产生与参考树拓扑上不同的系统发育树(四分体度量的相似度为 65%),但能够在其各自的分支内聚类相关物种,并将志贺氏菌菌株区分开来作为大肠杆菌菌株的最远相对物,这是参考树中没有看到的特征。

可用性和实现

代码和说明可在 https://github.com/meidanis-lab/rank-indel 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f198/9985151/0e7b8e80c9ad/btad087f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验