Suppr超能文献

用语言模型进行原子级蛋白质结构的进化尺度预测。

Evolutionary-scale prediction of atomic-level protein structure with a language model.

机构信息

FAIR, Meta AI, New York, NY, USA.

New York University, New York, NY, USA.

出版信息

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Abstract

Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.

摘要

最近机器学习的进展利用了多序列比对中的进化信息来预测蛋白质结构。我们使用大型语言模型展示了从原始序列直接推断全原子级蛋白质结构。随着蛋白质序列语言模型扩展到 150 亿个参数,蛋白质结构的原子分辨率图像在学习的表示中显现出来。这导致了高分辨率结构预测的数量级加速,从而实现了宏基因组蛋白质的大规模结构特征描述。我们应用这种能力通过预测 >6.17 亿个宏基因组蛋白质序列的结构来构建 ESM 宏基因组图谱,包括 >2.25 亿个具有高置信度的预测结构,从而深入了解了天然蛋白质的广泛多样性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验