• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

复合搜索:一种广义的网络方法,用于检测复合基因家族。

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection.

机构信息

Institut de Biologie Paris-Seine (IBPS), UPMC Université Paris 06, Sorbonne Universités, Paris, France.

Département de Sciences Biologiques, Université de Montréal, Montréal, QC, Canada.

出版信息

Mol Biol Evol. 2018 Jan 1;35(1):252-255. doi: 10.1093/molbev/msx283.

DOI:10.1093/molbev/msx283
PMID:29092069
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5850286/
Abstract

Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.

摘要

基因通过点突变进化,但也通过遗传片段的重排、融合和分裂进化。因此,两个序列之间的相似性可能是由于同源性产生的共同祖先,也可能是由于组成片段的部分共享。在大型分子数据集,由于计算时间的原因,区分这些过程尤其具有挑战性。在本文中,我们提出了 CompositeSearch,这是一种内存高效、快速且可扩展的方法,用于在大型数据集(通常在几百万个序列的范围内)中检测复合基因家族。CompositeSearch 将相似性网络的使用推广到检测复合和组成基因家族,其召回率、准确性和精度都高于最近的程序(FusedTriplets 和 MosaicFinder)。此外,CompositeSearch 提供了关于这些基因家族分布和原始序列保守性的用户友好的质量描述,从而可以对这些数据进行关键的生物学分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9c9/5850286/f7eb977a63c2/msx283f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9c9/5850286/f7eb977a63c2/msx283f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9c9/5850286/f7eb977a63c2/msx283f1.jpg

相似文献

1
CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection.复合搜索:一种广义的网络方法,用于检测复合基因家族。
Mol Biol Evol. 2018 Jan 1;35(1):252-255. doi: 10.1093/molbev/msx283.
2
MosaicFinder: identification of fused gene families in sequence similarity networks.马赛克搜索器:在序列相似性网络中鉴定融合基因家族。
Bioinformatics. 2013 Apr 1;29(7):837-44. doi: 10.1093/bioinformatics/btt049. Epub 2013 Jan 30.
3
Mulan: multiple-sequence local alignment and visualization for studying function and evolution.木兰:用于研究功能和进化的多序列局部比对与可视化
Genome Res. 2005 Jan;15(1):184-94. doi: 10.1101/gr.3007205. Epub 2004 Dec 8.
4
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
5
Optimization of sequence alignments according to the number of sequences vs. number of sites trade-off.根据序列数量与位点数量的权衡对序列比对进行优化。
BMC Bioinformatics. 2015 Jun 9;16:190. doi: 10.1186/s12859-015-0619-8.
6
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
7
Phylogenetic exploration of bacterial genomic rearrangements.细菌基因组重排的系统发育探索。
Bioinformatics. 2007 May 1;23(9):1172-4. doi: 10.1093/bioinformatics/btm070. Epub 2007 Mar 1.
8
GATA: a graphic alignment tool for comparative sequence analysis.GATA:一种用于比较序列分析的图形比对工具。
BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.
9
Prediction of function divergence in protein families using the substitution rate variation parameter alpha.利用替换率变化参数α预测蛋白质家族中的功能分化。
Mol Biol Evol. 2006 Jul;23(7):1406-13. doi: 10.1093/molbev/msl002. Epub 2006 May 3.
10
Computational analysis of evolution and conservation in a protein superfamily.蛋白质超家族进化与保守性的计算分析
Methods. 2004 Feb;32(2):73-92. doi: 10.1016/s1046-2023(03)00200-7.

引用本文的文献

1
An episodic burst of massive genomic rearrangements and the origin of non-marine annelids.大规模基因组重排的偶发爆发与非海洋环节动物的起源
Nat Ecol Evol. 2025 Jun 18. doi: 10.1038/s41559-025-02728-1.
2
On the origin of mitochondria: a multilayer network approach.线粒体的起源:一种多层次网络方法。
PeerJ. 2023 Jan 6;11:e14571. doi: 10.7717/peerj.14571. eCollection 2023.
3
Hundreds of Out-of-Frame Remodeled Gene Families in the Escherichia coli Pangenome.数百个大肠杆菌泛基因组中出框重排的基因家族。

本文引用的文献

1
Functional innovation from changes in protein domains and their combinations.蛋白质结构域及其组合的变化带来的功能创新。
Curr Opin Struct Biol. 2016 Jun;38:44-52. doi: 10.1016/j.sbi.2016.05.016. Epub 2016 Jun 13.
2
De Novo Genes Arise at a Slow but Steady Rate along the Primate Lineage and Have Been Subject to Incomplete Lineage Sorting.从头起源的基因沿着灵长类谱系以缓慢但稳定的速率产生,并且经历了不完全谱系分选。
Genome Biol Evol. 2016 Apr 25;8(4):1222-32. doi: 10.1093/gbe/evw074.
3
Protein networks identify novel symbiogenetic genes resulting from plastid endosymbiosis.
Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab329.
4
Phylogenomic fingerprinting of tempo and functions of horizontal gene transfer within ochrophytes.眼斑藻中水平基因转移的时空调控和功能的系统发育组指纹分析。
Proc Natl Acad Sci U S A. 2021 Jan 26;118(4). doi: 10.1073/pnas.2009974118.
5
Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach.自动构建分子相似性网络,用于生物活性肽化学空间中的可视化图挖掘:一种无监督学习方法。
Sci Rep. 2020 Oct 22;10(1):18074. doi: 10.1038/s41598-020-75029-1.
6
Gene Similarity Networks Unveil a Potential Novel Unicellular Group Closely Related to Animals from the Tara Oceans Expedition.基因相似性网络揭示了一个可能的新型单细胞群体,它与来自 Tara 海洋考察的动物密切相关。
Genome Biol Evol. 2020 Sep 1;12(9):1664-1678. doi: 10.1093/gbe/evaa117.
7
Characterization of Complex Core Genome and the Underlying Recombination and Positive Selection.复杂核心基因组及其潜在重组与正选择的特征分析
Front Genet. 2020 May 21;11:506. doi: 10.3389/fgene.2020.00506. eCollection 2020.
8
Ab Initio Construction and Evolutionary Analysis of Protein-Coding Gene Families with Partially Homologous Relationships: Closely Related Drosophila Genomes as a Case Study.从头构建和进化分析具有部分同源关系的蛋白质编码基因家族:以密切相关的果蝇基因组为例。
Genome Biol Evol. 2020 Mar 1;12(3):185-202. doi: 10.1093/gbe/evaa041.
9
Eukaryote Genes Are More Likely than Prokaryote Genes to Be Composites.真核生物基因比原核生物基因更有可能是复合基因。
Genes (Basel). 2019 Aug 28;10(9):648. doi: 10.3390/genes10090648.
10
Reticulate evolution in eukaryotes: Origin and evolution of the nitrate assimilation pathway.真核生物中的网状进化:硝酸盐同化途径的起源和进化。
PLoS Genet. 2019 Feb 21;15(2):e1007986. doi: 10.1371/journal.pgen.1007986. eCollection 2019 Feb.
蛋白质网络鉴定出由质体共生起源产生的新型共生基因。
Proc Natl Acad Sci U S A. 2016 Mar 29;113(13):3579-84. doi: 10.1073/pnas.1517551113. Epub 2016 Mar 14.
4
Network-Thinking: Graphs to Analyze Microbial Complexity and Evolution.网络思维:用于分析微生物复杂性与进化的图论
Trends Microbiol. 2016 Mar;24(3):224-237. doi: 10.1016/j.tim.2015.12.003. Epub 2016 Jan 13.
5
Origins of De Novo Genes in Human and Chimpanzee.人类和黑猩猩中新生基因的起源
PLoS Genet. 2015 Dec 31;11(12):e1005721. doi: 10.1371/journal.pgen.1005721. eCollection 2015 Dec.
6
Emergence of de novo proteins from 'dark genomic matter' by 'grow slow and moult'.通过“缓慢生长和蜕皮”从“暗基因组物质”中产生全新蛋白质。
Biochem Soc Trans. 2015 Oct;43(5):867-73. doi: 10.1042/BST20150089.
7
New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation.来自非编码序列的新基因:从头起源的蛋白质编码基因在真核生物进化创新中的作用
Philos Trans R Soc Lond B Biol Sci. 2015 Sep 26;370(1678):20140332. doi: 10.1098/rstb.2014.0332.
8
Extensive gene remodeling in the viral world: new evidence for nongradual evolution in the mobilome network.病毒世界中的广泛基因重塑:移动基因组网络中非渐进式进化的新证据。
Genome Biol Evol. 2014 Aug 7;6(9):2195-205. doi: 10.1093/gbe/evu168.
9
A pluralistic account of homology: adapting the models to the data.多元论的同源关系解释:使模型适应数据。
Mol Biol Evol. 2014 Mar;31(3):501-16. doi: 10.1093/molbev/mst228. Epub 2013 Nov 22.
10
MosaicFinder: identification of fused gene families in sequence similarity networks.马赛克搜索器:在序列相似性网络中鉴定融合基因家族。
Bioinformatics. 2013 Apr 1;29(7):837-44. doi: 10.1093/bioinformatics/btt049. Epub 2013 Jan 30.