自然的无家族基因组距离。

Natural family-free genomic distance.

作者信息

Rubert Diego P, Martinez Fábio V, Braga Marília D V

机构信息

Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, Brazil.

Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.

出版信息

Algorithms Mol Biol. 2021 May 10;16(1):4. doi: 10.1186/s13015-021-00183-8.

DOI:10.1186/s13015-021-00183-8

PMID:33971908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8111734/

Abstract

BACKGROUND

A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkämper et al. (J Comput Biol 28:410-431, 2021) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almost empty matchings give smaller distances.

RESULTS

In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger search space, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkämper et al. for instances with the same number of multiple connections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

摘要

背景

比较基因组学中的一个经典问题是计算重排距离，即把一个给定基因组转化为另一个给定基因组所需的大规模重排的最小次数。该领域的传统方法是基于家族的，也就是说，需要将两个基因组的DNA片段分类到各个家族中。此外，最基本的基于家族的模型能够在多项式时间内计算距离，但限制每个家族在每个基因组中最多出现一次。相比之下，在允许多家族（即多次出现的家族）的模型中进行距离计算是NP难的。最近，博嫩坎珀等人（《计算生物学杂志》28:410 - 431，2021）提出了一种整数线性规划（ILP）公式，用于计算具有多家族的基因组的基因组距离，允许由通用的双切割与连接（DCJ）操作表示的结构重排，以及DNA片段的内容修改插入和删除。这个ILP非常高效，但必须最大化每个多家族中基因的匹配，以防止出现“免费午餐”假象，否则空匹配或几乎空的匹配会给出更小的距离。

结果

在本文中，我们采用了另一种无家族设置，即不进行家族分类，而是简单地利用两个基因组的DNA片段之间的成对相似性来计算它们的重排距离。我们对上述ILP进行了调整，开发了一个模型，其中成对相似性用于为匹配和未匹配的基因分配权重，这样最优解不一定会最大化匹配。我们的模型进而得出了一种自然的无家族基因组距离，它考虑了所有给定的基因，无需事先分类到家族中，并且搜索空间由任意大小的匹配组成。尽管搜索空间更大，但由于权重的作用，我们的ILP似乎因共最优解数量的减少而得到了加速。事实上，对于具有相同数量多重连接的实例，它比博嫩坎珀等人的原始模型收敛得更快。我们不仅可以处理细菌基因组，还可以处理真菌和昆虫的基因组，或者哺乳动物和植物的染色体组。在对六个果蝇基因组的比较研究中，我们获得了准确的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8af1/8111734/d937f0877fbd/13015_2021_183_Fig1_HTML.jpg

相似文献

Natural family-free genomic distance.自然的无家族基因组距离。

Algorithms Mol Biol. 2021 May 10;16(1):4. doi: 10.1186/s13015-021-00183-8.

On the family-free DCJ distance and similarity.关于无家族的DCJ距离和相似度。

Algorithms Mol Biol. 2015 Apr 1;10:13. doi: 10.1186/s13015-015-0041-9. eCollection 2015.

Efficient gene orthology inference via large-scale rearrangements.通过大规模重排进行高效的基因直系同源推断。

Algorithms Mol Biol. 2023 Sep 28;18(1):14. doi: 10.1186/s13015-023-00238-y.

The potential of family-free rearrangements towards gene orthology inference.无家族重排用于基因直系同源性推断的潜力。

J Bioinform Comput Biol. 2021 Dec;19(6):2140014. doi: 10.1142/S021972002140014X. Epub 2021 Nov 13.

Recombinations, chains and caps: resolving problems with the DCJ-indel model.重组、链与端粒帽：用DCJ-插入缺失模型解决问题

Algorithms Mol Biol. 2024 Feb 27;19(1):8. doi: 10.1186/s13015-024-00253-7.

Sorting Linear Genomes with Rearrangements and Indels.通过重排和插入缺失对线性基因组进行排序

IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):500-6. doi: 10.1109/TCBB.2014.2329297.

Computing the Rearrangement Distance of Natural Genomes.计算自然基因组的重排距离。

J Comput Biol. 2021 Apr;28(4):410-431. doi: 10.1089/cmb.2020.0434. Epub 2020 Dec 30.

DCJ-indel and DCJ-substitution distances with distinct operation costs.具有不同操作成本的DCJ插入缺失和DCJ替换距离。

Algorithms Mol Biol. 2013 Jul 23;8(1):21. doi: 10.1186/1748-7188-8-21.

Approximating the DCJ distance of balanced genomes in linear time.在线性时间内近似平衡基因组的DCJ距离。

Algorithms Mol Biol. 2017 Mar 9;12:3. doi: 10.1186/s13015-017-0095-y. eCollection 2017.

An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes.一种用于计算具有重复基因的基因组的双切割连接距离的精确算法。

J Comput Biol. 2015 May;22(5):425-35. doi: 10.1089/cmb.2014.0096. Epub 2014 Dec 17.

引用本文的文献

Chromosomal gene order defines several structural classes of Staphylococcus epidermidis genomes.染色体基因顺序定义了表皮葡萄球菌基因组的几个结构类别。

PLoS One. 2024 Oct 4;19(10):e0311520. doi: 10.1371/journal.pone.0311520. eCollection 2024.

RIBAP: a comprehensive bacterial core genome annotation pipeline for pangenome calculation beyond the species level.RIBAP：一种全面的细菌核心基因组注释管道，用于超越物种水平的泛基因组计算。

Genome Biol. 2024 Jul 1;25(1):170. doi: 10.1186/s13059-024-03312-9.

Efficient gene orthology inference via large-scale rearrangements.通过大规模重排进行高效的基因直系同源推断。

Algorithms Mol Biol. 2023 Sep 28;18(1):14. doi: 10.1186/s13015-023-00238-y.

Generalizations of the genomic rank distance to indels.广义基因组秩距离与插入缺失

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad087.

本文引用的文献

Computing the Rearrangement Distance of Natural Genomes.计算自然基因组的重排距离。

J Comput Biol. 2021 Apr;28(4):410-431. doi: 10.1089/cmb.2020.0434. Epub 2020 Dec 30.

FlyBase: updates to the Drosophila melanogaster knowledge base.FlyBase：果蝇知识库的更新。

Nucleic Acids Res. 2021 Jan 8;49(D1):D899-D907. doi: 10.1093/nar/gkaa1026.

OMA standalone: orthology inference among public and custom genomes and transcriptomes.OMA 独立版：公共和定制基因组和转录组之间的同源推断。

Genome Res. 2019 Jul;29(7):1152-1163. doi: 10.1101/gr.243212.118. Epub 2019 Jun 24.

MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.MEGA X：跨越计算平台的分子进化遗传学分析。

Mol Biol Evol. 2018 Jun 1;35(6):1547-1549. doi: 10.1093/molbev/msy096.

TimeTree: A Resource for Timelines, Timetrees, and Divergence Times.TimeTree：一个用于时间线、时间树和分歧时间的资源。

Mol Biol Evol. 2017 Jul 1;34(7):1812-1819. doi: 10.1093/molbev/msx116.

Approximating the DCJ distance of balanced genomes in linear time.在线性时间内近似平衡基因组的DCJ距离。

Algorithms Mol Biol. 2017 Mar 9;12:3. doi: 10.1186/s13015-017-0095-y. eCollection 2017.

Ancestral Chromatin Configuration Constrains Chromatin Evolution on Differentiating Sex Chromosomes in Drosophila.祖先染色质构型限制果蝇性别分化染色体上的染色质进化。

PLoS Genet. 2015 Jun 26;11(6):e1005331. doi: 10.1371/journal.pgen.1005331. eCollection 2015 Jun.

On the family-free DCJ distance and similarity.关于无家族的DCJ距离和相似度。

Algorithms Mol Biol. 2015 Apr 1;10:13. doi: 10.1186/s13015-015-0041-9. eCollection 2015.

An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes.一种用于计算具有重复基因的基因组的双切割连接距离的精确算法。

J Comput Biol. 2015 May;22(5):425-35. doi: 10.1089/cmb.2014.0096. Epub 2014 Dec 17.

Inapproximability of (1,2)-exemplar distance.（1,2）-典范距离的不可近似性。

IEEE/ACM Trans Comput Biol Bioinform. 2013 Nov-Dec;10(6):1384-90. doi: 10.1109/TCBB.2012.144.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

自然的无家族基因组距离。

Natural family-free genomic distance.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

背景

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献