通过可解释的图变分自编码器生成三级蛋白质结构。

Generating tertiary protein structures via interpretable graph variational autoencoders.

作者信息

Guo Xiaojie, Du Yuanqi, Tadepalli Sivani, Zhao Liang, Shehu Amarda

机构信息

Department of Information Sciences and Technology, George Mason University, Fairfax, VA 22030, USA.

Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.

出版信息

Bioinform Adv. 2021 Nov 29;1(1):vbab036. doi: 10.1093/bioadv/vbab036. eCollection 2021.

DOI:10.1093/bioadv/vbab036

PMID:36700110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9710582/

Abstract

MOTIVATION

Modeling the structural plasticity of protein molecules remains challenging. Most research has focused on obtaining one biologically active structure. This includes the recent AlphaFold2 that has been hailed as a breakthrough for protein modeling. Computing one structure does not suffice to understand how proteins modulate their interactions and even evade our immune system. Revealing the structure space available to a protein remains challenging. Data-driven approaches that learn to generate tertiary structures are increasingly garnering attention. These approaches exploit the ability to represent tertiary structures as contact or distance maps and make direct analogies with images to harness convolution-based generative adversarial frameworks from computer vision. Since such opportunistic analogies do not allow capturing highly structured data, current deep models struggle to generate physically realistic tertiary structures.

RESULTS

We present novel deep generative models that build upon the graph variational autoencoder framework. In contrast to existing literature, we represent tertiary structures as 'contact' graphs, which allow us to leverage graph-generative deep learning. Our models are able to capture rich, local and distal constraints and additionally compute disentangled latent representations that reveal the impact of individual latent factors. This elucidates what the factors control and makes our models more interpretable. Rigorous comparative evaluation along various metrics shows that the models, we propose advance the state-of-the-art. While there is still much ground to cover, the work presented here is an important first step, and graph-generative frameworks promise to get us to our goal of unraveling the exquisite structural complexity of protein molecules.

AVAILABILITY AND IMPLEMENTATION

Code is available at https://github.com/anonymous1025/CO-VAE.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

对蛋白质分子的结构可塑性进行建模仍然具有挑战性。大多数研究都集中在获得一种生物活性结构上。这包括最近被誉为蛋白质建模突破的AlphaFold2。计算一种结构不足以理解蛋白质如何调节其相互作用，甚至逃避我们的免疫系统。揭示蛋白质可用的结构空间仍然具有挑战性。学习生成三级结构的数据驱动方法越来越受到关注。这些方法利用将三级结构表示为接触图或距离图的能力，并与图像进行直接类比，以利用计算机视觉中基于卷积的生成对抗框架。由于这种机会主义类比不允许捕获高度结构化的数据，当前的深度模型难以生成物理上逼真的三级结构。

结果

我们提出了基于图变分自编码器框架的新型深度生成模型。与现有文献不同，我们将三级结构表示为“接触”图，这使我们能够利用图生成深度学习。我们的模型能够捕获丰富的局部和远程约束，并额外计算解开的潜在表示，揭示各个潜在因素的影响。这阐明了哪些因素起控制作用，使我们的模型更具可解释性。沿各种指标进行的严格比较评估表明，我们提出的模型推动了当前技术水平的发展。虽然仍有许多工作要做，但这里介绍的工作是重要的第一步，图生成框架有望帮助我们实现解开蛋白质分子精细结构复杂性的目标。

可用性和实现

代码可在https://github.com/anonymous1025/CO-VAE获取。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db68/9710582/319681230d55/vbab036f1.jpg

相似文献

Generating tertiary protein structures via interpretable graph variational autoencoders.通过可解释的图变分自编码器生成三级蛋白质结构。

Bioinform Adv. 2021 Nov 29;1(1):vbab036. doi: 10.1093/bioadv/vbab036. eCollection 2021.

Small molecule generation via disentangled representation learning.通过解缠表征学习生成小分子

Bioinformatics. 2022 Jun 13;38(12):3200-3208. doi: 10.1093/bioinformatics/btac296.

Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures.数据规模和质量很重要：生成蛋白质三级结构的物理逼真距离图。

Biomolecules. 2022 Jun 29;12(7):908. doi: 10.3390/biom12070908.

Network-principled deep generative models for designing drug combinations as graph sets.基于网络原理的深度生成模型，用于将药物组合设计为图集合。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i445-i454. doi: 10.1093/bioinformatics/btaa317.

Generative Adversarial Learning of Protein Tertiary Structures.生成对抗网络学习蛋白质三级结构。

Molecules. 2021 Feb 24;26(5):1209. doi: 10.3390/molecules26051209.

Attri-VAE: Attribute-based interpretable representations of medical images with variational autoencoders.Attri-VAE：基于属性的医学图像可解释表示与变分自编码器

Comput Med Imaging Graph. 2023 Mar;104:102158. doi: 10.1016/j.compmedimag.2022.102158. Epub 2022 Dec 9.

A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations.基于变分推理和图自动编码器的 lncRNA-疾病关联预测的表示学习模型。

BMC Bioinformatics. 2021 Mar 21;22(1):136. doi: 10.1186/s12859-021-04073-z.

3D-equivariant graph neural networks for protein model quality assessment.用于蛋白质模型质量评估的 3D 等变图神经网络。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad030.

3D Generative Model Latent Disentanglement via Local Eigenprojection.基于局部特征投影的3D生成模型潜在解缠

Comput Graph Forum. 2023 Sep;42(6):e14793. doi: 10.1111/cgf.14793. Epub 2023 Apr 4.

Exploring generative deep learning for omics data using log-linear models.利用对数线性模型探索组学数据的生成式深度学习。

Bioinformatics. 2020 Dec 22;36(20):5045-5053. doi: 10.1093/bioinformatics/btaa623.

引用本文的文献

Fast protein structure searching using structure graph embeddings.使用结构图形嵌入的快速蛋白质结构搜索

Bioinform Adv. 2024 Mar 5;5(1):vbaf042. doi: 10.1093/bioadv/vbaf042. eCollection 2025.

Deep learning in template-free de novo biosynthetic pathway design of natural products.无模板的天然产物从头生物合成途径设计中的深度学习。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae495.

Deep-learning-based design of synthetic orthologs of SH3 signaling domains.基于深度学习的 SH3 信号结构域合成同源物的设计。

Cell Syst. 2024 Aug 21;15(8):725-737.e7. doi: 10.1016/j.cels.2024.07.005. Epub 2024 Aug 5.

Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review.使用基于图神经网络（EGNN）和扩散模型的基于结构的蛋白质和小分子生成：全面综述。

Comput Struct Biotechnol J. 2024 Jun 26;23:2779-2797. doi: 10.1016/j.csbj.2024.06.021. eCollection 2024 Dec.

De novo protein design by inversion of the AlphaFold structure prediction network.通过反转 AlphaFold 结构预测网络进行从头设计蛋白质。

Protein Sci. 2023 Jun;32(6):e4653. doi: 10.1002/pro.4653.

Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data.利用实验溶液数据学习进化无规则和无序蛋白质的结构组合。

J Chem Phys. 2023 May 7;158(17). doi: 10.1063/5.0141474.

Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder.通过变分自编码器增强对固有无序和有序蛋白质的构象采样。

Int J Mol Sci. 2023 Apr 7;24(8):6896. doi: 10.3390/ijms24086896.

From sequence to function through structure: Deep learning for protein design.从序列到功能再到结构：用于蛋白质设计的深度学习

Comput Struct Biotechnol J. 2022 Nov 19;21:238-250. doi: 10.1016/j.csbj.2022.11.014. eCollection 2023.

LAST: Latent Space-Assisted Adaptive Sampling for Protein Trajectories.LAST：用于蛋白质轨迹的潜在空间辅助自适应采样。

J Chem Inf Model. 2023 Jan 9;63(1):67-75. doi: 10.1021/acs.jcim.2c01213. Epub 2022 Dec 6.

Explore Protein Conformational Space With Variational Autoencoder.使用变分自编码器探索蛋白质构象空间。

Front Mol Biosci. 2021 Nov 12;8:781635. doi: 10.3389/fmolb.2021.781635. eCollection 2021.

本文引用的文献

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Generative Adversarial Learning of Protein Tertiary Structures.生成对抗网络学习蛋白质三级结构。

Molecules. 2021 Feb 24;26(5):1209. doi: 10.3390/molecules26051209.

Generative deep learning for macromolecular structure and dynamics.生成式深度学习在大分子结构与动力学中的应用。

Curr Opin Struct Biol. 2021 Apr;67:170-177. doi: 10.1016/j.sbi.2020.11.012. Epub 2020 Dec 15.

Exploring the intrinsic dynamics of SARS-CoV-2, SARS-CoV and MERS-CoV spike glycoprotein through normal mode analysis using anisotropic network model.通过各向异性网络模型的正常模式分析探索 SARS-CoV-2、SARS-CoV 和 MERS-CoV 刺突糖蛋白的固有动力学。

J Mol Graph Model. 2021 Jan;102:107778. doi: 10.1016/j.jmgm.2020.107778. Epub 2020 Oct 16.

Predicting the Real-Valued Inter-Residue Distances for Proteins.预测蛋白质的实值残基间距离

Adv Sci (Weinh). 2020 Aug 10;7(19):2001314. doi: 10.1002/advs.202001314. eCollection 2020 Oct.

Deciphering the protein motion of S1 subunit in SARS-CoV-2 spike glycoprotein through integrated computational methods.通过整合计算方法解析 SARS-CoV-2 刺突糖蛋白 S1 亚基的蛋白运动。

J Biomol Struct Dyn. 2021 Oct;39(17):6705-6712. doi: 10.1080/07391102.2020.1802338. Epub 2020 Aug 4.

Controlling the SARS-CoV-2 spike glycoprotein conformation.控制 SARS-CoV-2 刺突糖蛋白构象。

Nat Struct Mol Biol. 2020 Oct;27(10):925-933. doi: 10.1038/s41594-020-0479-4. Epub 2020 Jul 22.

Computational Structural Biology: Successes, Future Directions, and Challenges.计算结构生物学：成功、未来方向和挑战。

Molecules. 2019 Feb 12;24(3):637. doi: 10.3390/molecules24030637.

Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics.用于大分子结构与动力学建模的采样方法原理与概述

PLoS Comput Biol. 2016 Apr 28;12(4):e1004619. doi: 10.1371/journal.pcbi.1004619. eCollection 2016 Apr.

CONFOLD: Residue-residue contact-guided ab initio protein folding.CONFOLD：基于残基-残基接触引导的从头算蛋白质折叠。

Proteins. 2015 Aug;83(8):1436-49. doi: 10.1002/prot.24829. Epub 2015 Jun 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过可解释的图变分自编码器生成三级蛋白质结构。

Generating tertiary protein structures via interpretable graph variational autoencoders.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献