Suppr超能文献

ProtGPT2 是一个用于蛋白质设计的深度无监督语言模型。

ProtGPT2 is a deep unsupervised language model for protein design.

机构信息

Department of Biochemistry, University of Bayreuth, Bayreuth, Germany.

Institute of Informatics and Applications, University of Girona, Girona, Spain.

出版信息

Nat Commun. 2022 Jul 27;13(1):4348. doi: 10.1038/s41467-022-32007-7.

Abstract

Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Recent progress in Transformer-based architectures has enabled the implementation of language models capable of generating text with human-like capabilities. Here, motivated by this success, we describe ProtGPT2, a language model trained on the protein space that generates de novo protein sequences following the principles of natural ones. The generated proteins display natural amino acid propensities, while disorder predictions indicate that 88% of ProtGPT2-generated proteins are globular, in line with natural sequences. Sensitive sequence searches in protein databases show that ProtGPT2 sequences are distantly related to natural ones, and similarity networks further demonstrate that ProtGPT2 is sampling unexplored regions of protein space. AlphaFold prediction of ProtGPT2-sequences yields well-folded non-idealized structures with embodiments and large loops and reveals topologies not captured in current structure databases. ProtGPT2 generates sequences in a matter of seconds and is freely available.

摘要

蛋白质设计旨在构建针对特定目的定制的新型蛋白质,从而有可能解决许多环境和生物医学问题。基于转换器的架构的最新进展使得能够实现能够生成具有类似人类能力的文本的语言模型。在这里,受到这一成功的启发,我们描述了 ProtGPT2,这是一个在蛋白质空间上训练的语言模型,它根据自然原则生成新的蛋白质序列。生成的蛋白质显示出自然的氨基酸倾向,而无序预测表明 88%的 ProtGPT2 生成的蛋白质是球状的,与天然序列一致。在蛋白质数据库中进行敏感的序列搜索表明,ProtGPT2 序列与天然序列相差甚远,相似性网络进一步表明 ProtGPT2 正在探索蛋白质空间中未被探索的区域。ProtGPT2 序列的 AlphaFold 预测产生了具有实体和大环的折叠良好的非理想结构,并揭示了当前结构数据库中未捕获的拓扑结构。ProtGPT2 可以在几秒钟内生成序列,并且可以免费使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cc9/9329459/abda3bfda9b1/41467_2022_32007_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验