Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany.
BASF SE, Ludwigshafen 67056, Germany.
J Chem Inf Model. 2024 Aug 26;64(16):6259-6280. doi: 10.1021/acs.jcim.4c00747. Epub 2024 Aug 13.
Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
分子性质预测(MPP)在药物发现、作物保护和环境科学中至关重要。在过去的几十年中,已经开发出了多种计算技术,从使用简单的物理和化学性质以及分子指纹在统计模型和经典机器学习中的应用,到先进的深度学习方法。在这篇综述中,我们旨在从当前使用转换器模型进行 MPP 的研究中提取见解。我们分析了现有的模型,并探讨了在为 MPP 训练和微调转换器模型时出现的关键问题。这些问题包括预训练数据的选择和规模、最佳架构选择以及有前途的预训练目标。我们的分析突出了当前研究中尚未涵盖的领域,邀请进一步探索以增强该领域的理解。此外,我们还解决了比较不同模型的挑战,强调需要标准化的数据分割和稳健的统计分析。