Department of Life Science Informatics and Data Science, B-IT, Lamarr Institute for Machine Learning and Artificial Intelligence, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.
Sci Rep. 2023 Sep 26;13(1):16145. doi: 10.1038/s41598-023-43046-5.
For many machine learning applications in drug discovery, only limited amounts of training data are available. This typically applies to compound design and activity prediction and often restricts machine learning, especially deep learning. For low-data applications, specialized learning strategies can be considered to limit required training data. Among these is meta-learning that attempts to enable learning in low-data regimes by combining outputs of different models and utilizing meta-data from these predictions. However, in drug discovery settings, meta-learning is still in its infancy. In this study, we have explored meta-learning for the prediction of potent compounds via generative design using transformer models. For different activity classes, meta-learning models were derived to predict highly potent compounds from weakly potent templates in the presence of varying amounts of fine-tuning data and compared to other transformers developed for this task. Meta-learning consistently led to statistically significant improvements in model performance, in particular, when fine-tuning data were limited. Moreover, meta-learning models generated target compounds with higher potency and larger potency differences between templates and targets than other transformers, indicating their potential for low-data compound design.
对于药物发现中的许多机器学习应用,可用的训练数据量有限。这通常适用于化合物设计和活性预测,并且经常限制机器学习,尤其是深度学习。对于数据量较少的应用,可以考虑使用专门的学习策略来限制所需的训练数据。其中之一是元学习,它试图通过结合不同模型的输出并利用这些预测的元数据来实现低数据环境下的学习。然而,在药物发现环境中,元学习仍处于起步阶段。在这项研究中,我们探索了使用转换器模型通过生成设计进行强力化合物预测的元学习。对于不同的活性类别,衍生出元学习模型,以在存在不同数量的微调数据的情况下,从弱活性模板中预测高活性化合物,并与为此任务开发的其他转换器进行比较。元学习始终导致模型性能的统计学上显著提高,特别是在微调数据有限的情况下。此外,元学习模型生成的目标化合物的活性比其他转换器更高,模板和目标之间的活性差异也更大,表明它们在低数据化合物设计方面具有潜力。