Suppr超能文献

基于扩散和 ESM2 模型的蛋白 A 样肽设计。

Protein A-like Peptide Design Based on Diffusion and ESM2 Models.

机构信息

Department of Pharmaceutics, Beijing Institute of Petrochemical Technology, Beijing 102627, China.

Department of Computer Science, Beijing Institute of Petrochemical Technology, Beijing 102627, China.

出版信息

Molecules. 2024 Oct 21;29(20):4965. doi: 10.3390/molecules29204965.

Abstract

Proteins are the foundation of life, and designing functional proteins remains a key challenge in biotechnology. Before the development of AlphaFold2, the focus of design was primarily on structure-centric approaches such as using the well-known open-source software Rosetta3. Following the development of AlphaFold2, deep-learning techniques for protein design gained prominence. This study proposes a new method to generate functional proteins using the diffusion model and ESM2 protein language model. Diffusion models, which are widely used in image and natural language generation, are used here for protein design, facilitating the controlled generation of new sequences. The ESM2 model, trained on the basis of large-scale protein sequence data, provides a deep understanding of the context of the sequence, thus improving the model's ability to generate biologically relevant proteins. In this study, we used the Protein A-like peptide as a model study object, combined the diffusion model and the ESM2 model to generate new peptide sequences from minimal input data, and verified their biological activities through experiments such as the BLI affinity test. In conclusion, we developed a new method for protein design that provides a novel strategy to meet the challenges of generic protein generation.

摘要

蛋白质是生命的基础,设计功能性蛋白质仍然是生物技术中的一个关键挑战。在 AlphaFold2 开发之前,设计的重点主要是基于结构的方法,如使用著名的开源软件 Rosetta3。在 AlphaFold2 开发之后,用于蛋白质设计的深度学习技术开始受到关注。本研究提出了一种使用扩散模型和 ESM2 蛋白质语言模型生成功能性蛋白质的新方法。扩散模型广泛应用于图像和自然语言生成,在这里用于蛋白质设计,便于对新序列进行控制生成。基于大规模蛋白质序列数据进行训练的 ESM2 模型深入了解了序列的上下文,从而提高了模型生成具有生物学相关性的蛋白质的能力。在本研究中,我们使用蛋白 A 样肽作为模型研究对象,将扩散模型和 ESM2 模型结合起来,从最小的输入数据中生成新的肽序列,并通过 BLI 亲和力测试等实验验证它们的生物活性。总之,我们开发了一种新的蛋白质设计方法,为通用蛋白质生成提供了一种新的策略来应对挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0d/11510650/8eb0c3b46bd7/molecules-29-04965-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验