Suppr超能文献

深度学习时代的蛋白质工程。

Protein engineering in the deep learning era.

作者信息

Zhou Bingxin, Tan Yang, Hu Yutong, Zheng Lirong, Zhong Bozitao, Hong Liang

机构信息

Institute of Natural Sciences Shanghai Jiao Tong University Shanghai China.

Shanghai National Center for Applied Mathematics (SJTU center) Shanghai Jiao Tong University Shanghai China.

出版信息

mLife. 2024 Dec 26;3(4):477-491. doi: 10.1002/mlf2.12157. eCollection 2024 Dec.

Abstract

Advances in deep learning have significantly aided protein engineering in addressing challenges in industrial production, healthcare, and environmental sustainability. This review frames frequently researched problems in protein understanding and engineering from the perspective of deep learning. It provides a thorough discussion of representation methods for protein sequences and structures, along with general encoding pipelines that support both pre-training and supervised learning tasks. We summarize state-of-the-art protein language models, geometric deep learning techniques, and the combination of distinct approaches to learning from multi-modal biological data. Additionally, we outline common downstream tasks and relevant benchmark datasets for training and evaluating deep learning models, focusing on satisfying the particular needs of protein engineering applications, such as identifying mutation sites and predicting properties for candidates' virtual screening. This review offers biologists the latest tools for assisting their engineering projects while providing a clear and comprehensive guide for computer scientists to develop more powerful solutions by standardizing problem formulation and consolidating data resources. Future research can foresee a deeper integration of the communities of biology and computer science, unleashing the full potential of deep learning in protein engineering and driving new scientific breakthroughs.

摘要

深度学习的进展显著助力了蛋白质工程应对工业生产、医疗保健和环境可持续性方面的挑战。本综述从深度学习的角度阐述了蛋白质理解与工程中经常研究的问题。它全面讨论了蛋白质序列和结构的表示方法,以及支持预训练和监督学习任务的通用编码流程。我们总结了当前最先进的蛋白质语言模型、几何深度学习技术,以及从多模态生物数据中学习的不同方法的结合。此外,我们概述了用于训练和评估深度学习模型的常见下游任务及相关基准数据集,重点是满足蛋白质工程应用的特定需求,例如识别突变位点和预测用于候选物虚拟筛选的性质。本综述为生物学家提供了协助其工程项目的最新工具,同时为计算机科学家提供了一份清晰全面的指南,通过规范问题表述和整合数据资源来开发更强大的解决方案。未来的研究可以预见生物学和计算机科学领域将实现更深入的融合,释放深度学习在蛋白质工程中的全部潜力,并推动新的科学突破。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fc2/11685842/46b27818e3b0/MLF2-3-477-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验