Polykovskiy Daniil, Zhebrak Alexander, Sanchez-Lengeling Benjamin, Golovanov Sergey, Tatanov Oktai, Belyaev Stanislav, Kurbanov Rauf, Artamonov Aleksey, Aladinskiy Vladimir, Veselov Mark, Kadurin Artur, Johansson Simon, Chen Hongming, Nikolenko Sergey, Aspuru-Guzik Alán, Zhavoronkov Alex
Insilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong Kong.
Chemistry and Chemical Biology Department, Harvard University, Cambridge, MA, United States.
Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.
Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses.
生成模型正成为探索分子空间的首选工具。这些模型在大型训练数据集上进行学习,并生成具有相似性质的新型分子结构。生成的结构可用于虚拟筛选或在下游任务中训练半监督预测模型。虽然有大量的生成模型,但尚不清楚如何对它们进行比较和排名。在这项工作中,我们引入了一个名为分子集(MOSES)的基准测试平台,以规范分子生成模型的训练和比较。MOSES提供训练和测试数据集,以及一组评估生成结构的质量和多样性的指标。我们已经实现并比较了几种分子生成模型,并建议将我们的结果用作生成化学研究进一步发展的参考点。该平台和源代码可在https://github.com/molecularsets/moses上获取。