• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从头设计新型折叠结构的蛋白质:使用引导条件 Wasserstein 生成对抗网络。

De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks.

机构信息

Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States.

TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States.

出版信息

J Chem Inf Model. 2020 Dec 28;60(12):5667-5681. doi: 10.1021/acs.jcim.0c00593. Epub 2020 Sep 30.

DOI:10.1021/acs.jcim.0c00593
PMID:32945673
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7775287/
Abstract

Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to model training, and (3) exploiting sequence data with and without paired structures to enable a training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.

摘要

尽管蛋白质序列和结构方面的大量数据正在迅速积累,但蛋白质结构类型(或结构折叠)的数量却很少且有限。本研究旨在探讨以下问题:对于任意潜在的新型结构折叠,能否很好地揭示潜在的序列-结构关系并设计蛋白质序列?针对该问题,我们开发了新颖的深度生成模型,即半监督 gcWGAN(有指导的、条件的、Wasserstein 生成对抗网络)。为了克服训练困难并提高设计质量,我们在条件 Wasserstein GAN(WGAN)的基础上构建了模型,该模型在损失函数中使用 Wasserstein 距离。我们的主要贡献包括:(1)为输入构建折叠空间的低维且可推广的表示;(2)开发超快的序列到折叠预测器(或“oracle”),并将其反馈纳入 WGAN 作为损失以指导模型训练;(3)利用具有和不具有配对结构的序列数据来实现训练策略。通过在 100 多个不在训练集中的新型折叠上进行“oracle”评估,gcWGAN 生成的成功设计更多,涵盖的目标折叠数量是竞争数据驱动方法(cVAE)的 3.5 倍。通过基于序列和结构的预测器评估,gcWGAN 设计在物理和生物学上是合理的。通过代表性新型折叠的结构预测器评估,包括一个甚至不是基础折叠一部分的折叠,gcWGAN 设计的折叠准确性可与之媲美或更高,但序列多样性和新颖性却远高于 cVAE。通过生成设计种子和调整设计空间,超快的基于数据的模型进一步提高了基于原理的从头设计方法(RosettaDesign)的成功率。总之,gcWGAN 通过从当前的序列-结构数据中学习可推广的原则,探索未知的序列空间来设计蛋白质。数据、源代码和训练好的模型可在 https://github.com/Shen-Lab/gcWGAN 上获取。

相似文献

1
De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks.从头设计新型折叠结构的蛋白质:使用引导条件 Wasserstein 生成对抗网络。
J Chem Inf Model. 2020 Dec 28;60(12):5667-5681. doi: 10.1021/acs.jcim.0c00593. Epub 2020 Sep 30.
2
Rectified Wasserstein Generative Adversarial Networks for Perceptual Image Restoration.用于感知图像恢复的校正瓦瑟斯坦生成对抗网络
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3648-3663. doi: 10.1109/TPAMI.2022.3185316. Epub 2023 Feb 3.
3
Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design.Fold2Seq:一种基于联合序列(一维)-折叠(三维)嵌入的蛋白质设计生成模型。
Proc Mach Learn Res. 2021 Jul;139:1261-1271.
4
Network-principled deep generative models for designing drug combinations as graph sets.基于网络原理的深度生成模型,用于将药物组合设计为图集合。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i445-i454. doi: 10.1093/bioinformatics/btaa317.
5
Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks.基于条件瓦瑟斯坦生成对抗网络的多种蛋白质赖氨酸修饰位点预测与分析
BMC Bioinformatics. 2021 Mar 31;22(1):171. doi: 10.1186/s12859-021-04101-y.
6
Towards Generating Realistic Wrist Pulse Signals Using Enhanced One Dimensional Wasserstein GAN.使用增强一维 Wasserstein GAN 生成逼真的手腕脉搏信号。
Sensors (Basel). 2023 Jan 28;23(3):1450. doi: 10.3390/s23031450.
7
Generative Adversarial Networks for De Novo Molecular Design.生成对抗网络用于从头分子设计。
Mol Inform. 2021 Oct;40(10):e2100045. doi: 10.1002/minf.202100045. Epub 2021 Jul 6.
8
Parameter-Transferred Wasserstein Generative Adversarial Network (PT-WGAN) for Low-Dose PET Image Denoising.用于低剂量PET图像去噪的参数转移瓦瑟斯坦生成对抗网络(PT-WGAN)
IEEE Trans Radiat Plasma Med Sci. 2021 Mar;5(2):213-223. doi: 10.1109/trpms.2020.3025071. Epub 2020 Sep 21.
9
Exploring "dark-matter" protein folds using deep learning.利用深度学习探索“暗物质”蛋白折叠。
Cell Syst. 2024 Oct 16;15(10):898-910.e5. doi: 10.1016/j.cels.2024.09.006. Epub 2024 Oct 8.
10
Generative Adversarial Networks and Conditional Random Fields for Hyperspectral Image Classification.用于高光谱图像分类的生成对抗网络与条件随机场
IEEE Trans Cybern. 2020 Jul;50(7):3318-3329. doi: 10.1109/TCYB.2019.2915094. Epub 2019 May 30.

引用本文的文献

1
The development of the generative adversarial supporting vector machine for molecular property generation.用于分子性质生成的生成对抗支持向量机的开发。
J Cheminform. 2025 Jul 7;17(1):100. doi: 10.1186/s13321-025-01052-x.
2
AlphaFold distillation for inverse protein design.用于反向蛋白质设计的AlphaFold蒸馏法。
Sci Rep. 2025 Jul 1;15(1):21743. doi: 10.1038/s41598-025-00436-1.
3
A review of machine learning methods for imbalanced data challenges in chemistry.化学中不平衡数据挑战的机器学习方法综述。

本文引用的文献

1
Protein sequence design with a learned potential.利用学习到的势能进行蛋白质序列设计。
Nat Commun. 2022 Feb 8;13(1):746. doi: 10.1038/s41467-022-28313-9.
2
Fast and Flexible Protein Design Using Deep Graph Neural Networks.利用深度图神经网络实现快速灵活的蛋白质设计。
Cell Syst. 2020 Oct 21;11(4):402-411.e4. doi: 10.1016/j.cels.2020.08.016. Epub 2020 Sep 23.
3
DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet.DenseCPD:利用 DenseNet 提高基于神经网络的计算蛋白质序列设计的准确性。
Chem Sci. 2025 Apr 22;16(18):7637-7658. doi: 10.1039/d5sc00270b. eCollection 2025 May 7.
4
ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction.ScFold:一种基于图神经网络的模型,用于通过空间约简实现短链蛋白质的高效反向折叠。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf156.
5
Leveraging large language models for peptide antibiotic design.利用大语言模型进行肽类抗生素设计。
Cell Rep Phys Sci. 2025 Jan 15;6(1). doi: 10.1016/j.xcrp.2024.102359. Epub 2024 Dec 31.
6
AI-Assisted Rational Design and Activity Prediction of Biological Elements for Optimizing Transcription-Factor-Based Biosensors.人工智能辅助的生物元件理性设计和活性预测,用于优化基于转录因子的生物传感器。
Molecules. 2024 Jul 26;29(15):3512. doi: 10.3390/molecules29153512.
7
Machine Learning Methods in Protein-Protein Docking.机器学习方法在蛋白质-蛋白质对接中的应用。
Methods Mol Biol. 2024;2780:107-126. doi: 10.1007/978-1-0716-3985-6_7.
8
SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition.SPDesign:基于结构序列轮廓的蛋白质序列设计,使用超快形状识别。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae146.
9
Diffusion models in bioinformatics and computational biology.生物信息学和计算生物学中的扩散模型。
Nat Rev Bioeng. 2024 Feb;2(2):136-154. doi: 10.1038/s44222-023-00114-9. Epub 2023 Oct 27.
10
Interactive design generation and optimization from generative adversarial networks in spatial computing.空间计算中基于生成对抗网络的交互式设计生成与优化
Sci Rep. 2024 Mar 2;14(1):5154. doi: 10.1038/s41598-024-54783-6.
J Chem Inf Model. 2020 Mar 23;60(3):1245-1252. doi: 10.1021/acs.jcim.0c00043. Epub 2020 Mar 9.
4
Improved protein structure prediction using potentials from deep learning.利用深度学习势进行蛋白质结构预测的改进。
Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.
5
NetGO: improving large-scale protein function prediction with massive network information.NetGO:利用大规模网络信息提高大规模蛋白质功能预测。
Nucleic Acids Res. 2019 Jul 2;47(W1):W379-W387. doi: 10.1093/nar/gkz388.
6
De novo design of potent and selective mimics of IL-2 and IL-15.从头设计强效且高选择性的 IL-2 和 IL-15 模拟物。
Nature. 2019 Jan;565(7738):186-191. doi: 10.1038/s41586-018-0830-7. Epub 2019 Jan 9.
7
Asymmetric protein design from conserved supersecondary structures.从保守的超二级结构进行不对称蛋白质设计。
J Struct Biol. 2018 Dec;204(3):380-387. doi: 10.1016/j.jsb.2018.10.010. Epub 2018 Oct 26.
8
SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database.SCOPe:蛋白质结构分类扩展数据库中大分子结构的分类。
Nucleic Acids Res. 2019 Jan 8;47(D1):D475-D481. doi: 10.1093/nar/gky1134.
9
EEG Data Augmentation for Emotion Recognition Using a Conditional Wasserstein GAN.基于条件瓦瑟斯坦生成对抗网络的脑电数据增强用于情绪识别
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:2535-2538. doi: 10.1109/EMBC.2018.8512865.
10
iCFN: an efficient exact algorithm for multistate protein design.iCFN:一种用于多态蛋白质设计的高效精确算法。
Bioinformatics. 2018 Sep 1;34(17):i811-i820. doi: 10.1093/bioinformatics/bty564.