• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白推断,用于蛋白质功能推断的深度神经网络。

ProteInfer, deep neural networks for protein functional inference.

机构信息

The Francis Crick Institute, London, United Kingdom.

Google AI, Boston, United States.

出版信息

Elife. 2023 Feb 27;12:e80942. doi: 10.7554/eLife.80942.

DOI:10.7554/eLife.80942
PMID:36847334
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10063232/
Abstract

Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks to directly predict a variety of protein functions - Enzyme Commission (EC) numbers and Gene Ontology (GO) terms - directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user's personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, please visit https://google-research.github.io/proteinfer/.

摘要

从氨基酸序列预测蛋白质的功能是生物信息学中的一个长期挑战。传统的方法是使用序列比对,将查询序列与蛋白质家族的数千个模型或单个蛋白质序列的大型数据库进行比较。在这里,我们介绍 ProteInfer,它使用深度卷积神经网络直接从未对齐的氨基酸序列预测各种蛋白质功能——酶委员会 (EC) 编号和基因本体 (GO) 术语。这种方法提供了精确的预测,补充了基于比对的方法,并且单个神经网络的计算效率允许新的轻量级软件接口,我们通过一个在浏览器中的图形界面来演示蛋白质功能预测,其中所有计算都在用户的个人计算机上进行,没有数据上传到远程服务器。此外,这些模型将全长氨基酸序列放入通用功能空间,便于下游分析和解释。要阅读本文的交互式版本,请访问 https://google-research.github.io/proteinfer/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/25b011e2e84e/elife-80942-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/d52f78512519/elife-80942-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/af255357175a/elife-80942-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/42bf0c9368ea/elife-80942-scheme1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/89112af4de1b/elife-80942-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/2e754938bfa7/elife-80942-fig3-figsupp1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/ca6fa6e1cc71/elife-80942-fig3-figsupp2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/0b6b7576eb91/elife-80942-fig3-figsupp3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/7cda1d316f00/elife-80942-fig3-figsupp4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/711e587090aa/elife-80942-fig3-figsupp5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/188fcbda84f9/elife-80942-fig3-figsupp6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/d9691bde6a4e/elife-80942-fig3-figsupp7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/69ac80e74f2a/elife-80942-fig3-figsupp8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/be37a126f278/elife-80942-fig3-figsupp9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/0d82c6831f93/elife-80942-fig3-figsupp10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/e56cd16c0134/elife-80942-fig3-figsupp11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/9c29036d7f28/elife-80942-fig3-figsupp12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/269fe02a673c/elife-80942-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/fad550af496e/elife-80942-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/7d28b30f9c42/elife-80942-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/25b011e2e84e/elife-80942-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/d52f78512519/elife-80942-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/af255357175a/elife-80942-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/42bf0c9368ea/elife-80942-scheme1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/89112af4de1b/elife-80942-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/2e754938bfa7/elife-80942-fig3-figsupp1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/ca6fa6e1cc71/elife-80942-fig3-figsupp2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/0b6b7576eb91/elife-80942-fig3-figsupp3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/7cda1d316f00/elife-80942-fig3-figsupp4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/711e587090aa/elife-80942-fig3-figsupp5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/188fcbda84f9/elife-80942-fig3-figsupp6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/d9691bde6a4e/elife-80942-fig3-figsupp7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/69ac80e74f2a/elife-80942-fig3-figsupp8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/be37a126f278/elife-80942-fig3-figsupp9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/0d82c6831f93/elife-80942-fig3-figsupp10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/e56cd16c0134/elife-80942-fig3-figsupp11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/9c29036d7f28/elife-80942-fig3-figsupp12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/269fe02a673c/elife-80942-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/fad550af496e/elife-80942-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/7d28b30f9c42/elife-80942-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb24/10063232/25b011e2e84e/elife-80942-fig7.jpg

相似文献

1
ProteInfer, deep neural networks for protein functional inference.蛋白推断,用于蛋白质功能推断的深度神经网络。
Elife. 2023 Feb 27;12:e80942. doi: 10.7554/eLife.80942.
2
ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs.ProALIGN:通过利用上下文特定的对齐基序直接学习蛋白质结构预测的对齐。
J Comput Biol. 2022 Feb;29(2):92-105. doi: 10.1089/cmb.2021.0430. Epub 2022 Jan 21.
3
Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers.深度学习可实现酶委员会编号的高质量和高通量预测。
Proc Natl Acad Sci U S A. 2019 Jul 9;116(28):13996-14001. doi: 10.1073/pnas.1821905116. Epub 2019 Jun 20.
4
SUPERMAGO: Protein Function Prediction Based on Transformer Embeddings.SUPERMAGO:基于Transformer嵌入的蛋白质功能预测
Proteins. 2025 May;93(5):981-996. doi: 10.1002/prot.26782. Epub 2024 Dec 22.
5
High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.利用全卷积神经网络和最小序列特征进行高精度蛋白质接触预测。
Bioinformatics. 2018 Oct 1;34(19):3308-3315. doi: 10.1093/bioinformatics/bty341.
6
MEGA-GO: functions prediction of diverse protein sequence length using Multi-scalE Graph Adaptive neural network.MEGA-GO:使用多尺度图自适应神经网络预测不同蛋白质序列长度的功能
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf032.
7
DeepSS2GO: protein function prediction from secondary structure.DeepSS2GO:基于二级结构的蛋白质功能预测
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae196.
8
ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure.ILMCNet:一种利用 PLM 处理特征并采用 CRF 预测蛋白质二级结构的深度神经网络模型。
Genes (Basel). 2024 Oct 21;15(10):1350. doi: 10.3390/genes15101350.
9
Flattening the curve-How to get better results with small deep-mutational-scanning datasets.拉平曲线——如何从小规模深度突变扫描数据集获得更好的结果。
Proteins. 2024 Jul;92(7):886-902. doi: 10.1002/prot.26686. Epub 2024 Mar 19.
10
Prediction of interresidue contacts with DeepMetaPSICOV in CASP13.在 CASP13 中使用 DeepMetaPSICOV 预测残基间接触。
Proteins. 2019 Dec;87(12):1092-1099. doi: 10.1002/prot.25779. Epub 2019 Jul 27.

引用本文的文献

1
Prediction of enzyme function using an interpretable optimized ensemble learning framework.使用可解释的优化集成学习框架预测酶的功能。
Chem Sci. 2025 Sep 1. doi: 10.1039/d5sc04513d.
2
Deciphering enzymatic potential in metagenomic reads through DNA language models.通过DNA语言模型解读宏基因组读数中的酶促潜力。
Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf836.
3
Comparative transcriptomics analysis of the G20 biofilms grown on copper and polycarbonate surfaces.在铜和聚碳酸酯表面生长的G20生物膜的比较转录组学分析。

本文引用的文献

1
Using deep learning to annotate the protein universe.利用深度学习标注蛋白质宇宙。
Nat Biotechnol. 2022 Jun;40(6):932-937. doi: 10.1038/s41587-021-01179-w. Epub 2022 Feb 21.
2
ProteinBERT: a universal deep-learning model of protein sequence and function.蛋白质 BERT:一种通用的蛋白质序列和功能深度学习模型。
Bioinformatics. 2022 Apr 12;38(8):2102-2110. doi: 10.1093/bioinformatics/btac020.
3
De novo protein design by deep network hallucination.基于深度网络幻觉的从头设计蛋白质。
Biofilm. 2025 Aug 6;10:100309. doi: 10.1016/j.bioflm.2025.100309. eCollection 2025 Dec.
4
Protein functional site annotation using local structure embeddings.利用局部结构嵌入进行蛋白质功能位点注释。
Proc Natl Acad Sci U S A. 2025 Aug 26;122(34):e2513219122. doi: 10.1073/pnas.2513219122. Epub 2025 Aug 20.
5
EZpred: improving deep learning-based enzyme function prediction using unlabeled sequence homologs.EZpred:利用未标记的序列同源物改进基于深度学习的酶功能预测
bioRxiv. 2025 Jul 14:2025.07.09.663945. doi: 10.1101/2025.07.09.663945.
6
RC-GNN: A predictive model of enzyme-reaction pairs.RC-GNN:酶-反应对的预测模型。
bioRxiv. 2025 Jun 27:2025.06.22.660952. doi: 10.1101/2025.06.22.660952.
7
Semi-supervised data-integrated feature importance enhances performance and interpretability of biological classification tasks.半监督数据集成特征重要性提升了生物分类任务的性能和可解释性。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i373-i381. doi: 10.1093/bioinformatics/btaf190.
8
Chromosome-segment scanning for gain- or loss-of-function screening (CHASING).用于功能获得或功能丧失筛选的染色体片段扫描(CHASING)。
iScience. 2025 Apr 17;28(5):112484. doi: 10.1016/j.isci.2025.112484. eCollection 2025 May 16.
9
Annotating the microbial dark matter with HiFi-NN.用HiFi-NN注释微生物暗物质。
iScience. 2025 Apr 18;28(6):112480. doi: 10.1016/j.isci.2025.112480. eCollection 2025 Jun 20.
10
Genetic and Microbial Analysis of Invasiveness for Escherichia coli Strains Associated With Inflammatory Bowel Disease.与炎症性肠病相关的大肠杆菌菌株侵袭性的遗传和微生物分析
Cell Mol Gastroenterol Hepatol. 2025;19(4):101451. doi: 10.1016/j.jcmgh.2024.101451. Epub 2024 Dec 27.
Nature. 2021 Dec;600(7889):547-552. doi: 10.1038/s41586-021-04184-w. Epub 2021 Dec 1.
4
Disease variant prediction with deep generative models of evolutionary data.利用进化数据的深度生成模型进行疾病变异预测。
Nature. 2021 Nov;599(7883):91-95. doi: 10.1038/s41586-021-04043-8. Epub 2021 Oct 27.
5
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
6
PredictProtein - Predicting Protein Structure and Function for 29 Years.PredictProtein - 预测蛋白质结构和功能 29 年。
Nucleic Acids Res. 2021 Jul 2;49(W1):W535-W540. doi: 10.1093/nar/gkab354.
7
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
8
Low-N protein engineering with data-efficient deep learning.低蛋白工程与数据高效深度学习。
Nat Methods. 2021 Apr;18(4):389-396. doi: 10.1038/s41592-021-01100-y. Epub 2021 Apr 7.
9
Deep diversification of an AAV capsid protein by machine learning.机器学习深度多样化 AAV 衣壳蛋白。
Nat Biotechnol. 2021 Jun;39(6):691-696. doi: 10.1038/s41587-020-00793-4. Epub 2021 Feb 11.
10
Embeddings from deep learning transfer GO annotations beyond homology.深度学习的嵌入信息可以将 GO 注释扩展到同源之外。
Sci Rep. 2021 Jan 13;11(1):1160. doi: 10.1038/s41598-020-80786-0.