Correia João, Capela João, Rocha Miguel
CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal.
LABBELS - Associate Laboratory, Braga/Guimarães, Portugal.
J Cheminform. 2024 Dec 5;16(1):136. doi: 10.1186/s13321-024-00937-7.
The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance of model performance consistency across different datasets. Addressing these issues head-on, DeepMol stands out as an Automated ML (AutoML) tool by automating critical steps of the ML pipeline. DeepMol rapidly and automatically identifies the most effective data representation, pre-processing methods and model configurations for a specific molecular property/activity prediction problem. On 22 benchmark datasets, DeepMol obtained competitive pipelines compared with those requiring time-consuming feature engineering, model design and selection processes. As one of the first AutoML tools specifically developed for the computational chemistry domain, DeepMol stands out with its open-source code, in-depth tutorials, detailed documentation, and examples of real-world applications, all available at https://github.com/BioSystemsUM/DeepMol and https://deepmol.readthedocs.io/en/latest/ . By introducing AutoML as a groundbreaking feature in computational chemistry, DeepMol establishes itself as the pioneering state-of-the-art tool in the field.Scientific contributionDeepMol aims to provide an integrated framework of AutoML for computational chemistry. DeepMol provides a more robust alternative to other tools with its integrated pipeline serialization, enabling seamless deployment using the fit, transform, and predict paradigms. It uniquely supports both conventional and deep learning models for regression, classification and multi-task, offering unmatched flexibility compared to other AutoML tools. DeepMol's predefined configurations and customizable objective functions make it accessible to users at all skill levels while enabling efficient and reproducible workflows. Benchmarking on diverse datasets demonstrated its ability to deliver optimized pipelines and superior performance across various molecular machine-learning tasks.
由于机器学习(ML)技术的引入,计算化学领域经历了重大变革。尽管ML有潜力彻底改变该领域,但研究人员常常受到诸多障碍的困扰,比如选择最优算法的复杂性、数据预处理步骤的自动化、自适应特征工程的必要性以及确保模型在不同数据集上性能的一致性。为了直接解决这些问题,DeepMol作为一种自动化机器学习(AutoML)工具脱颖而出,它能自动执行ML流程中的关键步骤。对于特定的分子性质/活性预测问题,DeepMol能快速自动地识别最有效的数据表示、预处理方法和模型配置。在22个基准数据集上,与那些需要耗时的特征工程、模型设计和选择过程的方法相比,DeepMol获得了具有竞争力的流程。作为专门为计算化学领域开发的首批AutoML工具之一,DeepMol以其开源代码、深入的教程、详细的文档以及实际应用示例而著称,所有这些都可在https://github.com/BioSystemsUM/DeepMol和https://deepmol.readthedocs.io/en/latest/获取。通过将AutoML作为计算化学中的一项开创性特性引入,DeepMol确立了自己在该领域的开创性前沿工具地位。
科学贡献
DeepMol旨在为计算化学提供一个AutoML集成框架。DeepMol通过其集成的管道序列化提供了一种比其他工具更强大的选择,能够使用拟合、转换和预测范式进行无缝部署。它独特地支持用于回归、分类和多任务的传统模型和深度学习模型,与其他AutoML工具相比具有无与伦比的灵活性。DeepMol的预定义配置和可定制目标函数使所有技能水平的用户都能使用,同时实现高效且可重复的工作流程。在各种数据集上的基准测试证明了它能够在各种分子机器学习任务中提供优化的流程和卓越的性能。