广义基因组秩距离与插入缺失

Generalizations of the genomic rank distance to indels.

机构信息

Institute of Computing, University of Campinas, Campinas, Brazil.

MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College, London, UK.

出版信息

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad087.

DOI:10.1093/bioinformatics/btad087

PMID:36790056

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9985151/

Abstract

MOTIVATION

The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications.

RESULTS

We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree.

AVAILABILITY AND IMPLEMENTATION

Code and instructions are available at https://github.com/meidanis-lab/rank-indel.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

等级距离模型将多染色体基因组中的基因组重排表示为矩阵运算，这允许通过重排重建进化的简约历史。我们通过允许具有不同基因内容的基因组来寻求推广此模型，以适应更广泛的生物学背景。我们通过使用基因组的矩阵表示来实现这种推广。这导致了具有不同基因内容的基因组的简单距离公式和排序算法，但没有重复。

结果

我们以两种不同的方式将等级距离推广到具有不同基因内容的基因组。第一种方法是向基本操作中添加插入、缺失和单个末端的替换。我们展示了如何有效地计算这个距离。为了避免具有不完整标记的基因组，我们的替代距离，即等级插入缺失距离，仅使用整个染色体的插入和缺失。我们使用我们的距离和 DCJ-插入缺失距离构建模拟数据和真实原核基因组的系统发育树，并将它们与参考树进行比较。对于模拟数据，我们的距离使用四分体度量作为基线，优于 DCJ-插入缺失距离。这表明等级距离在比较远缘物种时更稳健。对于真实的原核基因组，所有基于重排的距离都产生与参考树拓扑上不同的系统发育树（四分体度量的相似度为 65%），但能够在其各自的分支内聚类相关物种，并将志贺氏菌菌株区分开来作为大肠杆菌菌株的最远相对物，这是参考树中没有看到的特征。

可用性和实现

代码和说明可在 https://github.com/meidanis-lab/rank-indel 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f198/9985151/0e7b8e80c9ad/btad087f1.jpg

相似文献

Generalizations of the genomic rank distance to indels.广义基因组秩距离与插入缺失

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad087.

Restricted DCJ-indel model: sorting linear genomes with DCJ and indels.受限 DCJ 插入缺失模型：使用 DCJ 和插入缺失对线性基因组进行排序。

BMC Bioinformatics. 2012;13 Suppl 19(Suppl 19):S14. doi: 10.1186/1471-2105-13-S19-S14. Epub 2012 Dec 19.

Sorting Linear Genomes with Rearrangements and Indels.通过重排和插入缺失对线性基因组进行排序

IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):500-6. doi: 10.1109/TCBB.2014.2329297.

Genomic distance under gene substitutions.基因替换下的基因组距离。

BMC Bioinformatics. 2011 Oct 5;12 Suppl 9(Suppl 9):S8. doi: 10.1186/1471-2105-12-S9-S8.

Computing the Inversion-Indel Distance.计算倒位-插入缺失距离。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2314-2326. doi: 10.1109/TCBB.2020.2988950. Epub 2021 Dec 8.

On the weight of indels in genomic distances.在基因组距离中的插入缺失的权重。

BMC Bioinformatics. 2011 Oct 5;12 Suppl 9(Suppl 9):S13. doi: 10.1186/1471-2105-12-S9-S13.

On the inversion-indel distance.关于倒位缺失距离。

BMC Bioinformatics. 2013;14 Suppl 15(Suppl 15):S3. doi: 10.1186/1471-2105-14-S15-S3. Epub 2013 Oct 15.

DCJ-indel and DCJ-substitution distances with distinct operation costs.具有不同操作成本的DCJ插入缺失和DCJ替换距离。

Algorithms Mol Biol. 2013 Jul 23;8(1):21. doi: 10.1186/1748-7188-8-21.

Double cut and join with insertions and deletions.带有插入和缺失的双切割与连接。

J Comput Biol. 2011 Sep;18(9):1167-84. doi: 10.1089/cmb.2011.0118.

Reversal and Transposition Distance on Unbalanced Genomes Using Intergenic Information.利用基因间信息计算不平衡基因组的反转和转位距离

J Comput Biol. 2023 Aug;30(8):861-876. doi: 10.1089/cmb.2023.0087. Epub 2023 May 24.

本文引用的文献

Counting Sorting Scenarios and Intermediate Genomes for the Rank Distance.秩距离的计数排序场景和中间基因组。

IEEE/ACM Trans Comput Biol Bioinform. 2024 May-Jun;21(3):316-327. doi: 10.1109/TCBB.2023.3277733. Epub 2024 Jun 5.

Natural family-free genomic distance.自然的无家族基因组距离。

Algorithms Mol Biol. 2021 May 10;16(1):4. doi: 10.1186/s13015-021-00183-8.

Computing the Rearrangement Distance of Natural Genomes.计算自然基因组的重排距离。

J Comput Biol. 2021 Apr;28(4):410-431. doi: 10.1089/cmb.2020.0434. Epub 2020 Dec 30.

Information theoretic generalized Robinson-Foulds metrics for comparing phylogenetic trees.基于信息论的广义 Robinson-Foulds 度量在比较系统发生树中的应用。

Bioinformatics. 2020 Dec 22;36(20):5007-5013. doi: 10.1093/bioinformatics/btaa614.

Rearrangement analysis of multiple bacterial genomes.多株细菌基因组重排分析。

BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):631. doi: 10.1186/s12859-019-3293-4.

ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R.ape 5.0：R 中的现代系统发育学和进化分析环境。

Bioinformatics. 2019 Feb 1;35(3):526-528. doi: 10.1093/bioinformatics/bty633.

Accurate differentiation of and serogroups: challenges and strategies.A群和B群的准确鉴别：挑战与策略。

New Microbes New Infect. 2017 Sep 23;21:58-62. doi: 10.1016/j.nmni.2017.09.003. eCollection 2018 Jan.

Identification of Escherichia coli and Shigella Species from Whole-Genome Sequences.从全基因组序列中鉴定大肠杆菌和志贺氏菌属

J Clin Microbiol. 2017 Feb;55(2):616-623. doi: 10.1128/JCM.01790-16. Epub 2016 Dec 14.

Defining chromosomal translocation risks in cancer.定义癌症中的染色体易位风险。

Proc Natl Acad Sci U S A. 2016 Jun 28;113(26):E3649-56. doi: 10.1073/pnas.1602025113. Epub 2016 Jun 14.

Median Approximations for Genomes Modeled as Matrices.作为矩阵建模的基因组的中位数近似值

Bull Math Biol. 2016 Apr;78(4):786-814. doi: 10.1007/s11538-016-0162-4. Epub 2016 Apr 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

广义基因组秩距离与插入缺失

Generalizations of the genomic rank distance to indels.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献