• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多剖面模型从蛋白质序列数据中提取特征,并解决非常不同蛋白质家族的功能多样性。

Multiple Profile Models Extract Features from Protein Sequence Data and Resolve Functional Diversity of Very Different Protein Families.

机构信息

CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, Sorbonne Université, 4 place Jussieu, 75005 Paris, France.

Institut des Sciences du Calcul et des Données, Sorbonne Université, Paris, France.

出版信息

Mol Biol Evol. 2022 Apr 10;39(4). doi: 10.1093/molbev/msac070.

DOI:10.1093/molbev/msac070
PMID:35353898
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9016551/
Abstract

Functional classification of proteins from sequences alone has become a critical bottleneck in understanding the myriad of protein sequences that accumulate in our databases. The great diversity of homologous sequences hides, in many cases, a variety of functional activities that cannot be anticipated. Their identification appears critical for a fundamental understanding of the evolution of living organisms and for biotechnological applications. ProfileView is a sequence-based computational method, designed to functionally classify sets of homologous sequences. It relies on two main ideas: the use of multiple profile models whose construction explores evolutionary information in available databases, and a novel definition of a representation space in which to analyze sequences with multiple profile models combined together. ProfileView classifies protein families by enriching known functional groups with new sequences and discovering new groups and subgroups. We validate ProfileView on seven classes of widespread proteins involved in the interaction with nucleic acids, amino acids and small molecules, and in a large variety of functions and enzymatic reactions. ProfileView agrees with the large set of functional data collected for these proteins from the literature regarding the organization into functional subgroups and residues that characterize the functions. In addition, ProfileView resolves undefined functional classifications and extracts the molecular determinants underlying protein functional diversity, showing its potential to select sequences towards accurate experimental design and discovery of novel biological functions. On protein families with complex domain architecture, ProfileView functional classification reconciles domain combinations, unlike phylogenetic reconstruction. ProfileView proves to outperform the functional classification approach PANTHER, the two k-mer-based methods CUPP and eCAMI and a neural network approach based on Restricted Boltzmann Machines. It overcomes time complexity limitations of the latter.

摘要

从序列 alone 对蛋白质进行功能分类已成为理解我们数据库中积累的大量蛋白质序列的关键瓶颈。同源序列的多样性在许多情况下隐藏了各种无法预料的功能活动。它们的鉴定对于理解生物进化的基本原理和生物技术应用至关重要。ProfileView 是一种基于序列的计算方法,旨在对同源序列集进行功能分类。它依赖于两个主要思想:使用多个 profile 模型,其构建探索了可用数据库中的进化信息,以及在分析与多个 profile 模型组合在一起的序列的表示空间中定义新的方法。ProfileView 通过用新序列丰富已知功能组并发现新的组和子组来对蛋白质家族进行分类。我们在涉及与核酸、氨基酸和小分子相互作用以及各种功能和酶反应的七种广泛存在的蛋白质类上验证了 ProfileView。ProfileView 与从文献中收集的这些蛋白质的大量功能数据一致,涉及到功能子组和特征功能的残基的组织。此外,ProfileView 解决了未定义的功能分类,并提取了蛋白质功能多样性的分子决定因素,显示了其选择序列进行准确实验设计和发现新生物学功能的潜力。对于具有复杂结构域架构的蛋白质家族,ProfileView 的功能分类与系统发育重建不同,能够协调结构域组合。ProfileView 证明优于功能分类方法 PANTHER、两种基于 k-mer 的方法 CUPP 和 eCAMI 以及基于受限玻尔兹曼机的神经网络方法。它克服了后一种方法的时间复杂度限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/588b82800d96/msac070f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/8f3317f0001f/msac070f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/65bd83f1f6b6/msac070f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/c75892322f99/msac070f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/1f70747064dd/msac070f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/e59fa625f620/msac070f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/9b97037e0f85/msac070f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/588b82800d96/msac070f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/8f3317f0001f/msac070f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/65bd83f1f6b6/msac070f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/c75892322f99/msac070f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/1f70747064dd/msac070f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/e59fa625f620/msac070f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/9b97037e0f85/msac070f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/086c/9016551/588b82800d96/msac070f7.jpg

相似文献

1
Multiple Profile Models Extract Features from Protein Sequence Data and Resolve Functional Diversity of Very Different Protein Families.多剖面模型从蛋白质序列数据中提取特征,并解决非常不同蛋白质家族的功能多样性。
Mol Biol Evol. 2022 Apr 10;39(4). doi: 10.1093/molbev/msac070.
2
Engineering Aspects of Olfaction嗅觉的工程学方面
3
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
4
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
5
Reconstruction of ancestral protein sequences and its applications.祖先蛋白质序列的重建及其应用。
BMC Evol Biol. 2004 Sep 17;4:33. doi: 10.1186/1471-2148-4-33.
6
Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling.联合进化树:一种基于序列采样预测蛋白质界面的大规模方法。
PLoS Comput Biol. 2009 Jan;5(1):e1000267. doi: 10.1371/journal.pcbi.1000267. Epub 2009 Jan 23.
7
Using CLUSTAL for multiple sequence alignments.使用CLUSTAL进行多序列比对。
Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.
8
Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using a "Plug and Play" Domain.自由基SAM超家族图谱:利用“即插即用”结构域实现功能的趋异进化
Methods Enzymol. 2018;606:1-71. doi: 10.1016/bs.mie.2018.06.004. Epub 2018 Jul 24.
9
PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.PANTHER:一个可浏览的基因产物数据库,根据生物学功能进行组织,采用经过整理的蛋白质家族和亚家族分类。
Nucleic Acids Res. 2003 Jan 1;31(1):334-41. doi: 10.1093/nar/gkg115.
10
Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.结合蛋白质序列、结构和动力学:PAS 结构域超家族功能进化分析的新方法。
Protein Sci. 2018 Feb;27(2):421-430. doi: 10.1002/pro.3329. Epub 2017 Nov 2.

引用本文的文献

1
Functional recoding of thioredoxin type-h into photosynthetic type-f by switching selectivity determinants.通过切换选择性决定因素将硫氧还蛋白h型功能重编码为光合f型
Front Plant Sci. 2025 Mar 6;16:1554272. doi: 10.3389/fpls.2025.1554272. eCollection 2025.
2
Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information.通过对序列协变信息的有针对性选择,可以预测和解释蛋白质突变的功能影响。
Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2312335121. doi: 10.1073/pnas.2312335121. Epub 2024 Jun 18.
3
The Landscape and Perspectives of the Human Gut Metaproteomics.

本文引用的文献

1
A topologically distinct class of photolyases specific for UV lesions within single-stranded DNA.一类拓扑结构独特的光解酶,专门作用于单链 DNA 内的 UV 损伤。
Nucleic Acids Res. 2020 Dec 16;48(22):12845-12857. doi: 10.1093/nar/gkaa1147.
2
TREND: a platform for exploring protein function in prokaryotes based on phylogenetic, domain architecture and gene neighborhood analyses.趋势:一个基于系统发育、结构域和基因邻域分析的探索原核生物蛋白功能的平台。
Nucleic Acids Res. 2020 Jul 2;48(W1):W72-W76. doi: 10.1093/nar/gkaa243.
3
Graph-based information diffusion method for prioritizing functionally related genes in protein-protein interaction networks.
人类肠道宏蛋白质组学的研究现状与展望。
Mol Cell Proteomics. 2024 May;23(5):100763. doi: 10.1016/j.mcpro.2024.100763. Epub 2024 Apr 10.
4
Molluscan Genomes Reveal Extensive Differences in Photopigment Evolution Across the Phylum.软体动物基因组揭示了门内光感受色素进化的广泛差异。
Mol Biol Evol. 2023 Dec 1;40(12). doi: 10.1093/molbev/msad263.
5
Infer global, predict local: Quantity-relevance trade-off in protein fitness predictions from sequence data.从序列数据推断全局,预测局部:蛋白质适应性预测中的数量-相关性权衡。
PLoS Comput Biol. 2023 Oct 26;19(10):e1011521. doi: 10.1371/journal.pcbi.1011521. eCollection 2023 Oct.
6
MyCLADE: a multi-source domain annotation server for sequence functional exploration.MyCLADE:一个用于序列功能探索的多源域注释服务器。
Nucleic Acids Res. 2021 Jul 2;49(W1):W452-W458. doi: 10.1093/nar/gkab395.
基于图的信息扩散方法,用于对蛋白质-蛋白质相互作用网络中的功能相关基因进行优先级排序。
Pac Symp Biocomput. 2020;25:439-450.
4
eCAMI: simultaneous classification and motif identification for enzyme annotation.eCAMI:酶注释的同时分类和基序识别。
Bioinformatics. 2020 Apr 1;36(7):2068-2075. doi: 10.1093/bioinformatics/btz908.
5
The meanings of 'function' in biology and the problematic case of de novo gene emergence.生物学中“功能”的含义与新生基因出现的问题案例。
Elife. 2019 Nov 1;8:e47014. doi: 10.7554/eLife.47014.
6
Machine learning techniques for protein function prediction.基于机器学习的蛋白质功能预测技术。
Proteins. 2020 Mar;88(3):397-413. doi: 10.1002/prot.25832. Epub 2019 Nov 14.
7
DeepGOPlus: improved protein function prediction from sequence.DeepGOPlus:从序列中改进蛋白质功能预测。
Bioinformatics. 2020 Jan 15;36(2):422-429. doi: 10.1093/bioinformatics/btz595.
8
Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP).通过保守独特肽模式(CUPP)对碳水化合物活性酶进行基于肽的功能注释。
Biotechnol Biofuels. 2019 Apr 30;12:102. doi: 10.1186/s13068-019-1436-5. eCollection 2019.
9
Interactive Tree Of Life (iTOL) v4: recent updates and new developments.交互式生命树 (iTOL) v4:最新更新和新发展。
Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259. doi: 10.1093/nar/gkz239.
10
Learning protein constitutive motifs from sequence data.从序列数据中学习蛋白质组成基序。
Elife. 2019 Mar 12;8:e39397. doi: 10.7554/eLife.39397.