一种用于三级蛋白质结构生成的混合变分自编码器和带梯度惩罚的 Wasserstein 生成对抗网络。

A hybrid variational autoencoder and WGAN with gradient penalty for tertiary protein structure generation.

作者信息

Sehsah Aalaa I, Mousa Afaf, Farouk Gamal

机构信息

Department of Computer Science, Faculty of Computers and Information, Kafrelsheikh University, Kafr El Sheikh, 33516, Egypt.

Department of Computer Science, Faculty of Computers and Information, Menoufia University, Shebin El Kom, 32511, Egypt.

出版信息

Sci Rep. 2025 Apr 23;15(1):14191. doi: 10.1038/s41598-025-94747-y.

DOI:10.1038/s41598-025-94747-y

PMID:40268976

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12019360/

Abstract

Elucidating the tertiary structure of proteins is important for understanding their functions and interactions. While deep neural networks have advanced the prediction of a protein's native structure from its amino acid sequence, the focus on a single-structure view limits understanding of the dynamic nature of protein molecules. Acquiring a multi-structure view of protein molecules remains a broader challenge in computational structural biology. Alternative representations, such as distance matrices, offer a compact and effective way to explore and generate realistic tertiary protein structures. This paper presents TP-VWGAN, a hybrid model to improve the realism of generating distance matrix representations of tertiary protein structures. The model integrates the probabilistic representation learning of the Variational Autoencoder (VAE) with the realistic data generation strength of the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). The main modification of TP-VWGAN is incorporating residual blocks into its VAE architecture to improve its performance. The experimental results show that TP-VWGAN with and without residual blocks outperforms existing methods in generating realistic protein structures, but incorporating residual blocks enhances its ability to capture key structural features. Comparisons also demonstrate that the more accurately a model learns symmetry features in the generated distance matrices, the better it captures key structural features, as demonstrated through benchmarking against existing methods. This work moves us closer to more advanced deep generative models that can explore a broader range of protein structures and be applied to drug design and protein engineering. The code and data are available at https://github.com/aalaa-sehsah/tp-vwgan .

摘要

阐明蛋白质的三级结构对于理解其功能和相互作用至关重要。虽然深度神经网络在从氨基酸序列预测蛋白质的天然结构方面取得了进展，但对单一结构视图的关注限制了对蛋白质分子动态性质的理解。在计算结构生物学中，获取蛋白质分子的多结构视图仍然是一个更大的挑战。诸如距离矩阵等替代表示法提供了一种紧凑而有效的方式来探索和生成逼真的蛋白质三级结构。本文提出了TP-VWGAN，这是一种混合模型，用于提高生成蛋白质三级结构距离矩阵表示的逼真度。该模型将变分自编码器（VAE）的概率表示学习与带梯度惩罚的瓦瑟斯坦生成对抗网络（WGAN-GP）的逼真数据生成能力相结合。TP-VWGAN的主要改进是在其VAE架构中加入了残差块以提高性能。实验结果表明，带有和不带有残差块的TP-VWGAN在生成逼真的蛋白质结构方面优于现有方法，但加入残差块增强了其捕捉关键结构特征的能力。比较还表明，通过与现有方法进行基准测试证明，模型在生成的距离矩阵中学习对称特征越准确，就越能更好地捕捉关键结构特征。这项工作使我们更接近能够探索更广泛蛋白质结构并应用于药物设计和蛋白质工程的更先进的深度生成模型。代码和数据可在https://github.com/aalaa-sehsah/tp-vwgan获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a83/12019360/dc2577642955/41598_2025_94747_Fig1_HTML.jpg

相似文献

A hybrid variational autoencoder and WGAN with gradient penalty for tertiary protein structure generation.一种用于三级蛋白质结构生成的混合变分自编码器和带梯度惩罚的 Wasserstein 生成对抗网络。

Sci Rep. 2025 Apr 23;15(1):14191. doi: 10.1038/s41598-025-94747-y.

Clustering Analysis via Deep Generative Models With Mixture Models.基于混合模型的深度生成模型的聚类分析

IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):340-350. doi: 10.1109/TNNLS.2020.3027761. Epub 2022 Jan 5.

VAE-WACGAN: An Improved Data Augmentation Method Based on VAEGAN for Intrusion Detection.变分自编码器- Wasserstein对抗生成网络：一种基于变分自编码器-生成对抗网络的改进型入侵检测数据增强方法

Sensors (Basel). 2024 Sep 18;24(18):6035. doi: 10.3390/s24186035.

WGAN-GP_Glu: A semi-supervised model based on double generator-Wasserstein GAN with gradient penalty algorithm for glutarylation site identification.WGAN-GP_Glu：一种基于双生成器-瓦瑟斯坦生成对抗网络和梯度惩罚算法的半监督模型，用于戊二酰化位点识别。

Comput Biol Med. 2025 Jan;184:109328. doi: 10.1016/j.compbiomed.2024.109328. Epub 2024 Nov 14.

Generating tertiary protein structures via interpretable graph variational autoencoders.通过可解释的图变分自编码器生成三级蛋白质结构。

Bioinform Adv. 2021 Nov 29;1(1):vbab036. doi: 10.1093/bioadv/vbab036. eCollection 2021.

Data generation for connected and automated vehicle tests using deep learning models.利用深度学习模型进行车联网和自动驾驶车辆测试的数据生成。

Accid Anal Prev. 2023 Sep;190:107192. doi: 10.1016/j.aap.2023.107192. Epub 2023 Jun 26.

Deep clustering analysis via variational autoencoder with Gamma mixture latent embeddings.基于具有伽马混合潜在嵌入的变分自编码器的深度聚类分析。

Neural Netw. 2025 Mar;183:106979. doi: 10.1016/j.neunet.2024.106979. Epub 2024 Dec 4.

Generative Adversarial Learning of Protein Tertiary Structures.生成对抗网络学习蛋白质三级结构。

Molecules. 2021 Feb 24;26(5):1209. doi: 10.3390/molecules26051209.

Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures.数据规模和质量很重要：生成蛋白质三级结构的物理逼真距离图。

Biomolecules. 2022 Jun 29;12(7):908. doi: 10.3390/biom12070908.

druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico.druGAN：一种高级生成对抗自动编码器模型，可在计算机上从头生成具有所需分子特性的新分子。

Mol Pharm. 2017 Sep 5;14(9):3098-3104. doi: 10.1021/acs.molpharmaceut.7b00346. Epub 2017 Aug 4.

本文引用的文献

Sensors (Basel). 2024 Sep 18;24(18):6035. doi: 10.3390/s24186035.

Major AlphaFold upgrade offers boost for drug discovery.AlphaFold重大升级助力药物研发。

Nature. 2024 May;629(8012):509-510. doi: 10.1038/d41586-024-01383-z.

Advances in AI for Protein Structure Prediction: Implications for Cancer Drug Discovery and Development.人工智能在蛋白质结构预测方面的进展：对癌症药物发现和开发的影响。

Biomolecules. 2024 Mar 12;14(3):339. doi: 10.3390/biom14030339.

Harnessing generative AI to decode enzyme catalysis and evolution for enhanced engineering.利用生成式人工智能解码酶催化与进化以加强工程设计。

Natl Sci Rev. 2023 Dec 28;10(12):nwad331. doi: 10.1093/nsr/nwad331. eCollection 2023 Dec.

PLM-GAN: A Large-Scale Protein Loop Modeling Using pix2pix GAN.PLM-GAN：一种使用pix2pix生成对抗网络的大规模蛋白质环建模方法

ACS Omega. 2023 Dec 15;9(1):437-446. doi: 10.1021/acsomega.3c05863. eCollection 2024 Jan 9.

Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures.数据规模和质量很重要：生成蛋白质三级结构的物理逼真距离图。

Biomolecules. 2022 Jun 29;12(7):908. doi: 10.3390/biom12070908.

Graph Neural Networks and Their Current Applications in Bioinformatics.图神经网络及其在生物信息学中的当前应用。

Front Genet. 2021 Jul 29;12:690049. doi: 10.3389/fgene.2021.690049. eCollection 2021.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Protein contact map refinement for improving structure prediction using generative adversarial networks.利用生成对抗网络进行蛋白质接触图优化以改进结构预测

Bioinformatics. 2021 Oct 11;37(19):3168-3174. doi: 10.1093/bioinformatics/btab220.

Generative Adversarial Learning of Protein Tertiary Structures.生成对抗网络学习蛋白质三级结构。

Molecules. 2021 Feb 24;26(5):1209. doi: 10.3390/molecules26051209.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于三级蛋白质结构生成的混合变分自编码器和带梯度惩罚的 Wasserstein 生成对抗网络。

A hybrid variational autoencoder and WGAN with gradient penalty for tertiary protein structure generation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献