复合搜索：一种广义的网络方法，用于检测复合基因家族。

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection.

机构信息

Institut de Biologie Paris-Seine (IBPS), UPMC Université Paris 06, Sorbonne Universités, Paris, France.

Département de Sciences Biologiques, Université de Montréal, Montréal, QC, Canada.

出版信息

Mol Biol Evol. 2018 Jan 1;35(1):252-255. doi: 10.1093/molbev/msx283.

DOI:10.1093/molbev/msx283

PMID:29092069

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5850286/

Abstract

Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.

摘要

基因通过点突变进化，但也通过遗传片段的重排、融合和分裂进化。因此，两个序列之间的相似性可能是由于同源性产生的共同祖先，也可能是由于组成片段的部分共享。在大型分子数据集，由于计算时间的原因，区分这些过程尤其具有挑战性。在本文中，我们提出了 CompositeSearch，这是一种内存高效、快速且可扩展的方法，用于在大型数据集（通常在几百万个序列的范围内）中检测复合基因家族。CompositeSearch 将相似性网络的使用推广到检测复合和组成基因家族，其召回率、准确性和精度都高于最近的程序（FusedTriplets 和 MosaicFinder）。此外，CompositeSearch 提供了关于这些基因家族分布和原始序列保守性的用户友好的质量描述，从而可以对这些数据进行关键的生物学分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9c9/5850286/f7eb977a63c2/msx283f1.jpg

相似文献

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection.复合搜索：一种广义的网络方法，用于检测复合基因家族。

Mol Biol Evol. 2018 Jan 1;35(1):252-255. doi: 10.1093/molbev/msx283.

MosaicFinder: identification of fused gene families in sequence similarity networks.马赛克搜索器：在序列相似性网络中鉴定融合基因家族。

Bioinformatics. 2013 Apr 1;29(7):837-44. doi: 10.1093/bioinformatics/btt049. Epub 2013 Jan 30.

Mulan: multiple-sequence local alignment and visualization for studying function and evolution.木兰：用于研究功能和进化的多序列局部比对与可视化

Genome Res. 2005 Jan;15(1):184-94. doi: 10.1101/gr.3007205. Epub 2004 Dec 8.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Optimization of sequence alignments according to the number of sequences vs. number of sites trade-off.根据序列数量与位点数量的权衡对序列比对进行优化。

BMC Bioinformatics. 2015 Jun 9;16:190. doi: 10.1186/s12859-015-0619-8.

Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

Phylogenetic exploration of bacterial genomic rearrangements.细菌基因组重排的系统发育探索。

Bioinformatics. 2007 May 1;23(9):1172-4. doi: 10.1093/bioinformatics/btm070. Epub 2007 Mar 1.

GATA: a graphic alignment tool for comparative sequence analysis.GATA：一种用于比较序列分析的图形比对工具。

BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.

Prediction of function divergence in protein families using the substitution rate variation parameter alpha.利用替换率变化参数α预测蛋白质家族中的功能分化。

Mol Biol Evol. 2006 Jul;23(7):1406-13. doi: 10.1093/molbev/msl002. Epub 2006 May 3.

Computational analysis of evolution and conservation in a protein superfamily.蛋白质超家族进化与保守性的计算分析

Methods. 2004 Feb;32(2):73-92. doi: 10.1016/s1046-2023(03)00200-7.

引用本文的文献

An episodic burst of massive genomic rearrangements and the origin of non-marine annelids.大规模基因组重排的偶发爆发与非海洋环节动物的起源

Nat Ecol Evol. 2025 Jun 18. doi: 10.1038/s41559-025-02728-1.

On the origin of mitochondria: a multilayer network approach.线粒体的起源：一种多层次网络方法。

PeerJ. 2023 Jan 6;11:e14571. doi: 10.7717/peerj.14571. eCollection 2023.

Hundreds of Out-of-Frame Remodeled Gene Families in the Escherichia coli Pangenome.数百个大肠杆菌泛基因组中出框重排的基因家族。

Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab329.

Phylogenomic fingerprinting of tempo and functions of horizontal gene transfer within ochrophytes.眼斑藻中水平基因转移的时空调控和功能的系统发育组指纹分析。

Proc Natl Acad Sci U S A. 2021 Jan 26;118(4). doi: 10.1073/pnas.2009974118.

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach.自动构建分子相似性网络，用于生物活性肽化学空间中的可视化图挖掘：一种无监督学习方法。

Sci Rep. 2020 Oct 22;10(1):18074. doi: 10.1038/s41598-020-75029-1.

Genome Biol Evol. 2020 Sep 1;12(9):1664-1678. doi: 10.1093/gbe/evaa117.

Characterization of Complex Core Genome and the Underlying Recombination and Positive Selection.复杂核心基因组及其潜在重组与正选择的特征分析

Front Genet. 2020 May 21;11:506. doi: 10.3389/fgene.2020.00506. eCollection 2020.

Ab Initio Construction and Evolutionary Analysis of Protein-Coding Gene Families with Partially Homologous Relationships: Closely Related Drosophila Genomes as a Case Study.从头构建和进化分析具有部分同源关系的蛋白质编码基因家族：以密切相关的果蝇基因组为例。

Genome Biol Evol. 2020 Mar 1;12(3):185-202. doi: 10.1093/gbe/evaa041.

Eukaryote Genes Are More Likely than Prokaryote Genes to Be Composites.真核生物基因比原核生物基因更有可能是复合基因。

Genes (Basel). 2019 Aug 28;10(9):648. doi: 10.3390/genes10090648.

Reticulate evolution in eukaryotes: Origin and evolution of the nitrate assimilation pathway.真核生物中的网状进化：硝酸盐同化途径的起源和进化。

PLoS Genet. 2019 Feb 21;15(2):e1007986. doi: 10.1371/journal.pgen.1007986. eCollection 2019 Feb.

本文引用的文献

Functional innovation from changes in protein domains and their combinations.蛋白质结构域及其组合的变化带来的功能创新。

Curr Opin Struct Biol. 2016 Jun;38:44-52. doi: 10.1016/j.sbi.2016.05.016. Epub 2016 Jun 13.

De Novo Genes Arise at a Slow but Steady Rate along the Primate Lineage and Have Been Subject to Incomplete Lineage Sorting.从头起源的基因沿着灵长类谱系以缓慢但稳定的速率产生，并且经历了不完全谱系分选。

Genome Biol Evol. 2016 Apr 25;8(4):1222-32. doi: 10.1093/gbe/evw074.

Protein networks identify novel symbiogenetic genes resulting from plastid endosymbiosis.蛋白质网络鉴定出由质体共生起源产生的新型共生基因。

Proc Natl Acad Sci U S A. 2016 Mar 29;113(13):3579-84. doi: 10.1073/pnas.1517551113. Epub 2016 Mar 14.

Network-Thinking: Graphs to Analyze Microbial Complexity and Evolution.网络思维：用于分析微生物复杂性与进化的图论

Trends Microbiol. 2016 Mar;24(3):224-237. doi: 10.1016/j.tim.2015.12.003. Epub 2016 Jan 13.

Origins of De Novo Genes in Human and Chimpanzee.人类和黑猩猩中新生基因的起源

PLoS Genet. 2015 Dec 31;11(12):e1005721. doi: 10.1371/journal.pgen.1005721. eCollection 2015 Dec.

Emergence of de novo proteins from 'dark genomic matter' by 'grow slow and moult'.通过“缓慢生长和蜕皮”从“暗基因组物质”中产生全新蛋白质。

Biochem Soc Trans. 2015 Oct;43(5):867-73. doi: 10.1042/BST20150089.

New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation.来自非编码序列的新基因：从头起源的蛋白质编码基因在真核生物进化创新中的作用

Philos Trans R Soc Lond B Biol Sci. 2015 Sep 26;370(1678):20140332. doi: 10.1098/rstb.2014.0332.

Extensive gene remodeling in the viral world: new evidence for nongradual evolution in the mobilome network.病毒世界中的广泛基因重塑：移动基因组网络中非渐进式进化的新证据。

Genome Biol Evol. 2014 Aug 7;6(9):2195-205. doi: 10.1093/gbe/evu168.

A pluralistic account of homology: adapting the models to the data.多元论的同源关系解释：使模型适应数据。

Mol Biol Evol. 2014 Mar;31(3):501-16. doi: 10.1093/molbev/mst228. Epub 2013 Nov 22.

MosaicFinder: identification of fused gene families in sequence similarity networks.马赛克搜索器：在序列相似性网络中鉴定融合基因家族。

Bioinformatics. 2013 Apr 1;29(7):837-44. doi: 10.1093/bioinformatics/btt049. Epub 2013 Jan 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

复合搜索：一种广义的网络方法，用于检测复合基因家族。

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献