• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

未腐败的SMILES:一种全新的从头设计方法。

UnCorrupt SMILES: a novel approach to de novo design.

作者信息

Schoenmaker Linde, Béquignon Olivier J M, Jespers Willem, van Westen Gerard J P

机构信息

Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands.

出版信息

J Cheminform. 2023 Feb 14;15(1):22. doi: 10.1186/s13321-023-00696-x.

DOI:10.1186/s13321-023-00696-x
PMID:36788579
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9926805/
Abstract

Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89%, with different models having different error-type distributions. Post hoc correction of SMILES was shown to increase model validity. The SMILES corrector trained with one error per input alters 60-90% of invalid generator outputs and fixes 35-80% of them. However, a higher error detection and performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60-95% of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators based on novelty and similarity. Additionally, the SMILES corrector can be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39% and a novelty of approximately 20%. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates.

摘要

生成式深度学习模型已成为从头设计药物的一种强大方法,因为它们有助于研究人员找到具有所需特性的新分子。尽管该领域不断取得进展,但基于序列的从头生成器产生的一部分输出由于错误而无法推进。在此,我们建议事后修正这些无效输出。在类似任务中,自然语言处理领域的Transformer模型已被证明非常有效。因此,这里训练了这种类型的模型,将无效的简化分子输入线性输入系统(SMILES)转换为有效的表示形式。在四种代表性的从头生成方法上评估了这种SMILES校正器的性能:递归神经网络(RNN)、目标导向RNN、生成对抗网络(GAN)和变分自编码器(VAE)。本研究发现,这些特定生成模型的无效输出百分比在4%至89%之间,不同模型具有不同的错误类型分布。SMILES的事后校正被证明可以提高模型的有效性。每个输入训练一个错误的SMILES校正器会改变60 - 90%的无效生成器输出,并修复其中35 - 80%。然而,对于每个输入训练多个错误的Transformer模型,检测到的错误和性能更高。在这种情况下,最佳模型能够校正60 - 95%的无效生成器输出。进一步分析表明,这些修正后的分子在新颖性和相似性方面与从头生成器产生的正确分子相当。此外,SMILES校正器可用于在目标化学空间内扩展有趣的新分子数量。在现有分子中引入不同的错误会产生独特性为39%、新颖性约为20%的新型类似物。本研究结果表明,SMILES校正是一种可行的事后扩展,可以加强对更好候选药物的搜索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/0d0492c45fda/13321_2023_696_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/7aaf6e30eb2b/13321_2023_696_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/07a383124245/13321_2023_696_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/ac4812379576/13321_2023_696_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/c7960ad1fe72/13321_2023_696_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/0d0492c45fda/13321_2023_696_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/7aaf6e30eb2b/13321_2023_696_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/07a383124245/13321_2023_696_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/ac4812379576/13321_2023_696_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/c7960ad1fe72/13321_2023_696_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b933/9926805/0d0492c45fda/13321_2023_696_Fig5_HTML.jpg

相似文献

1
UnCorrupt SMILES: a novel approach to de novo design.未腐败的SMILES:一种全新的从头设计方法。
J Cheminform. 2023 Feb 14;15(1):22. doi: 10.1186/s13321-023-00696-x.
2
GEN: highly efficient SMILES explorer using autodidactic generative examination networks.GEN:使用自学习生成式检查网络的高效SMILES资源探索器。
J Cheminform. 2020 Apr 10;12(1):22. doi: 10.1186/s13321-020-00425-8.
3
FSM-DDTR: End-to-end feedback strategy for multi-objective De Novo drug design using transformers.FSM-DDTR:使用变压器的多目标从头药物设计的端到端反馈策略。
Comput Biol Med. 2023 Sep;164:107285. doi: 10.1016/j.compbiomed.2023.107285. Epub 2023 Jul 31.
4
Generative Pre-trained Transformer (GPT) based model with relative attention for de novo drug design.基于生成式预训练转换器(GPT)的相对注意力模型在从头设计药物中的应用。
Comput Biol Chem. 2023 Oct;106:107911. doi: 10.1016/j.compbiolchem.2023.107911. Epub 2023 Jun 29.
5
Bidirectional Molecule Generation with Recurrent Neural Networks.双向分子生成的递归神经网络。
J Chem Inf Model. 2020 Mar 23;60(3):1175-1183. doi: 10.1021/acs.jcim.9b00943. Epub 2020 Jan 16.
6
CONSMI: Contrastive Learning in the Simplified Molecular Input Line Entry System Helps Generate Better Molecules.CONSMI:简化分子输入线性条目系统中的对比学习有助于生成更好的分子。
Molecules. 2024 Jan 19;29(2):495. doi: 10.3390/molecules29020495.
7
Adversarial Threshold Neural Computer for Molecular de Novo Design.对抗式阈神经网络计算机在分子从头设计中的应用
Mol Pharm. 2018 Oct 1;15(10):4386-4397. doi: 10.1021/acs.molpharmaceut.7b01137. Epub 2018 Mar 30.
8
Generative Adversarial Networks for De Novo Molecular Design.生成对抗网络用于从头分子设计。
Mol Inform. 2021 Oct;40(10):e2100045. doi: 10.1002/minf.202100045. Epub 2021 Jul 6.
9
Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction.利用筛选数据、双 SMILES 自动编码器、迁移学习和语法修正自动化生成新片段。
J Chem Inf Model. 2021 Jun 28;61(6):2547-2559. doi: 10.1021/acs.jcim.0c01226. Epub 2021 May 24.
10
Improving Chemical Autoencoder Latent Space and Molecular Generation Diversity with Heteroencoders.用异构图编码器改进化学自动编码器潜在空间和分子生成多样性。
Biomolecules. 2018 Oct 30;8(4):131. doi: 10.3390/biom8040131.

引用本文的文献

1
The future of pharmaceuticals: Artificial intelligence in drug discovery and development.制药的未来:药物研发中的人工智能
J Pharm Anal. 2025 Aug;15(8):101248. doi: 10.1016/j.jpha.2025.101248. Epub 2025 Feb 26.
2
Generative artificial intelligence based models optimization towards molecule design enhancement.基于生成式人工智能的模型优化以增强分子设计
J Cheminform. 2025 Aug 4;17(1):116. doi: 10.1186/s13321-025-01059-4.
3
A systematic review of deep learning chemical language models in recent era.近期深度学习化学语言模型的系统综述。

本文引用的文献

1
Artificial intelligence in multi-objective drug design.多目标药物设计中的人工智能
Curr Opin Struct Biol. 2023 Apr;79:102537. doi: 10.1016/j.sbi.2023.102537. Epub 2023 Feb 10.
2
Papyrus: a large-scale curated dataset aimed at bioactivity predictions.纸莎草纸:一个旨在进行生物活性预测的大规模精选数据集。
J Cheminform. 2023 Jan 6;15(1):3. doi: 10.1186/s13321-022-00672-x.
3
Retrosynthesis with attention-based NMT model and chemical analysis of "wrong" predictions.基于注意力机制的神经机器翻译模型的逆合成及“错误”预测的化学分析
J Cheminform. 2024 Nov 18;16(1):129. doi: 10.1186/s13321-024-00916-y.
4
DrugSynthMC: An Atom-Based Generation of Drug-like Molecules with Monte Carlo Search.DrugSynthMC:基于原子的药物分子生成与蒙特卡罗搜索。
J Chem Inf Model. 2024 Sep 23;64(18):7097-7107. doi: 10.1021/acs.jcim.4c01451. Epub 2024 Sep 9.
5
Machine Learning Empowering Drug Discovery: Applications, Opportunities and Challenges.机器学习赋能药物研发:应用、机遇与挑战。
Molecules. 2024 Feb 18;29(4):903. doi: 10.3390/molecules29040903.
6
Multi-and many-objective optimization: present and future in drug design.多目标和多目标优化:药物设计的现状与未来
Front Chem. 2023 Dec 18;11:1288626. doi: 10.3389/fchem.2023.1288626. eCollection 2023.
7
3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors.3DDPDs:用于蛋白质化学计量生物活性预测的蛋白质动力学描述。以(突变型)G蛋白偶联受体为例。
J Cheminform. 2023 Aug 28;15(1):74. doi: 10.1186/s13321-023-00745-5.
8
CysDB: a human cysteine database based on experimental quantitative chemoproteomics.CysDB:一个基于实验定量化学蛋白质组学的人类半胱氨酸数据库。
Cell Chem Biol. 2023 Jun 15;30(6):683-698.e3. doi: 10.1016/j.chembiol.2023.04.004. Epub 2023 Apr 28.
RSC Adv. 2020 Jan 8;10(3):1371-1378. doi: 10.1039/c9ra08535a. eCollection 2020 Jan 7.
4
Generative machine learning for de novo drug discovery: A systematic review.生成式机器学习在从头药物发现中的应用:系统评价。
Comput Biol Med. 2022 Jun;145:105403. doi: 10.1016/j.compbiomed.2022.105403. Epub 2022 Mar 13.
5
DeLA-Drug: A Deep Learning Algorithm for Automated Design of Druglike Analogues.DELADrug:一种用于自动设计类药物类似物的深度学习算法。
J Chem Inf Model. 2022 Mar 28;62(6):1411-1424. doi: 10.1021/acs.jcim.2c00205. Epub 2022 Mar 16.
6
DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology.DrugEx v2:基于帕累托的多目标强化学习在多药理学中从头设计药物分子
J Cheminform. 2021 Nov 12;13(1):85. doi: 10.1186/s13321-021-00561-9.
7
Generative Deep Learning for Targeted Compound Design.生成式深度学习在靶向化合物设计中的应用。
J Chem Inf Model. 2021 Nov 22;61(11):5343-5361. doi: 10.1021/acs.jcim.0c01496. Epub 2021 Oct 26.
8
MolGPT: Molecular Generation Using a Transformer-Decoder Model.MolGPT:基于 Transformer-Decoder 模型的分子生成。
J Chem Inf Model. 2022 May 9;62(9):2064-2076. doi: 10.1021/acs.jcim.1c00600. Epub 2021 Oct 25.
9
Generative Models for De Novo Drug Design.用于从头药物设计的生成模型。
J Med Chem. 2021 Oct 14;64(19):14011-14027. doi: 10.1021/acs.jmedchem.1c00927. Epub 2021 Sep 17.
10
Derivatization Design of Synthetically Accessible Space for Optimization: Synthesis vs Deep Generative Design.用于优化的合成可达空间的衍生化设计:合成与深度生成设计
ACS Med Chem Lett. 2021 Jan 7;12(2):185-194. doi: 10.1021/acsmedchemlett.0c00540. eCollection 2021 Feb 11.