Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland.
AI Innovation Lab, Novartis Pharma AG, Dublin 4, Irland.
J Chem Inf Model. 2023 Aug 14;63(15):4497-4504. doi: 10.1021/acs.jcim.3c00523. Epub 2023 Jul 24.
Machine-learning and deep-learning models have been extensively used in cheminformatics to predict molecular properties, to reduce the need for direct measurements, and to accelerate compound prioritization. However, different setups and frameworks and the large number of molecular representations make it difficult to properly evaluate, reproduce, and compare them. Here we present a new PREdictive modeling FramEwoRk for molecular discovery (PREFER), written in Python (version 3.7.7) and based on AutoSklearn (version 0.14.7), that allows comparison between different molecular representations and common machine-learning models. We provide an overview of the design of our framework and show exemplary use cases and results of several representation-model combinations on diverse data sets, both public and in-house. Finally, we discuss the use of PREFER on small data sets. The code of the framework is freely available on GitHub.
机器学习和深度学习模型在化学信息学中得到了广泛应用,可用于预测分子性质,减少直接测量的需求,并加速化合物优先级排序。然而,不同的设置和框架以及大量的分子表示形式使得难以正确评估、复制和比较它们。在这里,我们提出了一个新的用于分子发现的预测建模框架(PREFER),它是用 Python(版本 3.7.7)编写的,基于 AutoSklearn(版本 0.14.7),允许在不同的分子表示和常见的机器学习模型之间进行比较。我们概述了我们框架的设计,并展示了在不同数据集(包括公共数据集和内部数据集)上的几个表示-模型组合的示例用例和结果。最后,我们讨论了在小数据集上使用 PREFER 的情况。该框架的代码可在 GitHub 上免费获取。