分子集（MOSES）：分子生成模型的基准测试平台。

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.

作者信息

Polykovskiy Daniil, Zhebrak Alexander, Sanchez-Lengeling Benjamin, Golovanov Sergey, Tatanov Oktai, Belyaev Stanislav, Kurbanov Rauf, Artamonov Aleksey, Aladinskiy Vladimir, Veselov Mark, Kadurin Artur, Johansson Simon, Chen Hongming, Nikolenko Sergey, Aspuru-Guzik Alán, Zhavoronkov Alex

机构信息

Insilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong Kong.

Chemistry and Chemical Biology Department, Harvard University, Cambridge, MA, United States.

出版信息

Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.

DOI:10.3389/fphar.2020.565644

PMID:33390943

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7775580/

Abstract

Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses.

摘要

生成模型正成为探索分子空间的首选工具。这些模型在大型训练数据集上进行学习，并生成具有相似性质的新型分子结构。生成的结构可用于虚拟筛选或在下游任务中训练半监督预测模型。虽然有大量的生成模型，但尚不清楚如何对它们进行比较和排名。在这项工作中，我们引入了一个名为分子集（MOSES）的基准测试平台，以规范分子生成模型的训练和比较。MOSES提供训练和测试数据集，以及一组评估生成结构的质量和多样性的指标。我们已经实现并比较了几种分子生成模型，并建议将我们的结果用作生成化学研究进一步发展的参考点。该平台和源代码可在https://github.com/molecularsets/moses上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92f1/7775580/3f277b84b9d5/fphar-11-565644-g001.jpg

相似文献

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.分子集（MOSES）：分子生成模型的基准测试平台。

Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.

MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design.MolScore：一种用于从头药物设计中生成模型的评分、评估和基准测试框架。

J Cheminform. 2024 May 30;16(1):64. doi: 10.1186/s13321-024-00861-w.

Probabilistic generative transformer language models for generative design of molecules.用于分子生成设计的概率生成式变压器语言模型。

J Cheminform. 2023 Sep 25;15(1):88. doi: 10.1186/s13321-023-00759-z.

LOGICS: Learning optimal generative distribution for designing de novo chemical structures.LOGICS：学习用于设计全新化学结构的最优生成分布。

J Cheminform. 2023 Sep 7;15(1):77. doi: 10.1186/s13321-023-00747-3.

Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning.利用生成式预训练变换器和深度强化学习在化学空间中优化结合亲和力

F1000Res. 2024 Feb 20;12:757. doi: 10.12688/f1000research.130936.2. eCollection 2023.

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study.深度生成模型中基于结构和配体的评分函数比较：以G蛋白偶联受体为例的研究

J Cheminform. 2021 May 13;13(1):39. doi: 10.1186/s13321-021-00516-0.

GuacaMol: Benchmarking Models for de Novo Molecular Design.GuacaMol：从头设计分子的模型基准测试。

J Chem Inf Model. 2019 Mar 25;59(3):1096-1108. doi: 10.1021/acs.jcim.8b00839. Epub 2019 Mar 19.

Generative Adversarial Networks for De Novo Molecular Design.生成对抗网络用于从头分子设计。

Mol Inform. 2021 Oct;40(10):e2100045. doi: 10.1002/minf.202100045. Epub 2021 Jul 6.

Comparative Study of Deep Generative Models on Chemical Space Coverage.化学空间覆盖的深度生成模型比较研究。

J Chem Inf Model. 2021 Jun 28;61(6):2572-2581. doi: 10.1021/acs.jcim.0c01328. Epub 2021 May 20.

AnoChem: Prediction of chemical structural abnormalities based on machine learning models.AnoChem：基于机器学习模型预测化学结构异常

Comput Struct Biotechnol J. 2024 May 15;23:2116-2121. doi: 10.1016/j.csbj.2024.05.017. eCollection 2024 Dec.

引用本文的文献

Design and optimization of novel succinate dehydrogenase inhibitors against agricultural fungi based on transformer model.基于Transformer模型的新型抗农业真菌琥珀酸脱氢酶抑制剂的设计与优化

Mol Divers. 2025 Aug 19. doi: 10.1007/s11030-025-11323-2.

MGMG: Cell Morphology-Guided Molecule Generation for Drug Discovery.MGMG：用于药物发现的细胞形态学引导分子生成

bioRxiv. 2025 Jul 17:2025.07.11.664424. doi: 10.1101/2025.07.11.664424.

Property-driven localization and characterization in deep molecular representations.深度分子表征中的属性驱动定位与表征

Sci Rep. 2025 Aug 11;15(1):29365. doi: 10.1038/s41598-025-09717-1.

Leveraging tree-transformer VAE with fragment tokenization for high-performance large chemical model generation.利用带有片段分词的树变换器变分自编码器进行高性能大型化学模型生成。

Commun Chem. 2025 Aug 5;8(1):228. doi: 10.1038/s42004-025-01640-w.

Benchmarking 3D Structure-Based Molecule Generators.基于3D结构的分子生成器的基准测试

J Chem Inf Model. 2025 Aug 11;65(15):8006-8021. doi: 10.1021/acs.jcim.5c01020. Epub 2025 Jul 25.

CMD-OPT model enables the discovery of a potent and selective RIPK2 inhibitor as preclinical candidate for the treatment of acute liver injury.CMD-OPT模型能够发现一种强效且具有选择性的RIPK2抑制剂，作为治疗急性肝损伤的临床前候选药物。

Acta Pharm Sin B. 2025 Jul;15(7):3708-3724. doi: 10.1016/j.apsb.2025.05.003. Epub 2025 May 13.

Diffusion-based generative drug-like molecular editing with chemical natural language.基于扩散的类药物分子生成式编辑与化学自然语言

J Pharm Anal. 2025 Jun;15(6):101137. doi: 10.1016/j.jpha.2024.101137. Epub 2024 Feb 11.

Effective generation of heavy-atom-free triplet photosensitizers containing multiple intersystem crossing mechanisms based on deep learning.基于深度学习有效生成包含多种系间窜越机制的无重原子三重态光敏剂。

Chem Sci. 2025 Jul 8. doi: 10.1039/d5sc03192c.

Generative Deep Learning for de Novo Drug Design─A Chemical Space Odyssey.用于从头药物设计的生成式深度学习——一场化学空间奥德赛。

J Chem Inf Model. 2025 Jul 28;65(14):7352-7372. doi: 10.1021/acs.jcim.5c00641. Epub 2025 Jul 9.

IUPAC-GPT: an IUPAC-based large-scale molecular pre-trained model for property prediction and molecule generation.IUPAC-GPT：一种基于国际纯粹与应用化学联合会（IUPAC）的大规模分子预训练模型，用于性质预测和分子生成。

Mol Divers. 2025 Jul 3. doi: 10.1007/s11030-025-11280-w.

本文引用的文献

Randomized SMILES strings improve the quality of molecular generative models.随机化的SMILES字符串提高了分子生成模型的质量。

J Cheminform. 2019 Nov 21;11(1):71. doi: 10.1186/s13321-019-0393-0.

A de novo molecular generation method using latent vector based generative adversarial network.一种使用基于潜在向量的生成对抗网络的从头分子生成方法。

J Cheminform. 2019 Dec 3;11(1):74. doi: 10.1186/s13321-019-0397-9.

Identification of Novel Antibacterials Using Machine Learning Techniques.利用机器学习技术鉴定新型抗菌药物

Front Pharmacol. 2019 Aug 27;10:913. doi: 10.3389/fphar.2019.00913. eCollection 2019.

Deep learning enables rapid identification of potent DDR1 kinase inhibitors.深度学习可快速鉴定有效的 DDR1 激酶抑制剂。

Nat Biotechnol. 2019 Sep;37(9):1038-1040. doi: 10.1038/s41587-019-0224-x. Epub 2019 Sep 2.

GuacaMol: Benchmarking Models for de Novo Molecular Design.GuacaMol：从头设计分子的模型基准测试。

J Chem Inf Model. 2019 Mar 25;59(3):1096-1108. doi: 10.1021/acs.jcim.8b00839. Epub 2019 Mar 19.

Virtual Compound Libraries in Computer-Assisted Drug Discovery.计算机辅助药物发现中的虚拟化合物库。

J Chem Inf Model. 2019 Feb 25;59(2):644-651. doi: 10.1021/acs.jcim.8b00737. Epub 2019 Jan 24.

Artificial intelligence for aging and longevity research: Recent advances and perspectives.人工智能在衰老和长寿研究中的应用：最新进展与展望。

Ageing Res Rev. 2019 Jan;49:49-66. doi: 10.1016/j.arr.2018.11.003. Epub 2018 Nov 22.

Two Decades under the Influence of the Rule of Five and the Changing Properties of Approved Oral Drugs.五规则影响下的二十年和已批准口服药物性质的变化。

J Med Chem. 2019 Feb 28;62(4):1701-1714. doi: 10.1021/acs.jmedchem.8b00686. Epub 2018 Sep 27.

Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery.纠缠条件对抗自动编码器用于从头发现药物。

Mol Pharm. 2018 Oct 1;15(10):4398-4405. doi: 10.1021/acs.molpharmaceut.8b00839. Epub 2018 Sep 19.

Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery.Fréchet ChemNet 距离：药物发现中分子生成模型的一种度量。

J Chem Inf Model. 2018 Sep 24;58(9):1736-1741. doi: 10.1021/acs.jcim.8b00234. Epub 2018 Aug 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

分子集（MOSES）：分子生成模型的基准测试平台。

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献