Bochtler Matthias
International institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland.
Institute of Biochemistry and Biophysics, Warsaw, Poland.
Bioessays. 2025 Jan;47(1):e2400155. doi: 10.1002/bies.202400155. Epub 2024 Oct 15.
The performance of deep Neural Networks (NNs) in the text (ChatGPT) and image (DALL-E2) domains has attracted worldwide attention. Convolutional NNs (CNNs), Large Language Models (LLMs), Denoising Diffusion Probabilistic Models (DDPMs)/Noise Conditional Score Networks (NCSNs), and Graph NNs (GNNs) have impacted computer vision, language editing and translation, automated conversation, image generation, and social network management. Proteins can be viewed as texts written with the alphabet of amino acids, as images, or as graphs of interacting residues. Each of these perspectives suggests the use of tools from a different area of deep learning for protein structural biology. Here, I review how CNNs, LLMs, DDPMs/NCSNs, and GNNs have led to major advances in protein structure prediction, inverse folding, protein design, and small molecule design. This review is primarily intended as a deep learning primer for practicing experimental structural biologists. However, extensive references to the deep learning literature should also make it relevant to readers who have a background in machine learning, physics or statistics, and an interest in protein structural biology.
深度神经网络(NNs)在文本(ChatGPT)和图像(DALL-E2)领域的表现已引起全球关注。卷积神经网络(CNNs)、大语言模型(LLMs)、去噪扩散概率模型(DDPMs)/噪声条件得分网络(NCSNs)和图神经网络(GNNs)已经对计算机视觉、语言编辑与翻译、自动对话、图像生成以及社交网络管理产生了影响。蛋白质可以被看作是由氨基酸字母表书写而成的文本、图像,或者是相互作用残基的图。这些观点中的每一个都表明可以将深度学习不同领域的工具用于蛋白质结构生物学。在此,我回顾卷积神经网络、大语言模型、去噪扩散概率模型/噪声条件得分网络和图神经网络如何在蛋白质结构预测、反向折叠、蛋白质设计和小分子设计方面带来了重大进展。这篇综述主要是为从事实验结构生物学的人员提供深度学习入门知识。然而,对深度学习文献的广泛引用也应使其对具有机器学习、物理或统计学背景且对蛋白质结构生物学感兴趣的读者具有参考价值。