Modeling and Informatics, Merck & Co., Inc., South San Francisco, California 94080, United States.
J Chem Inf Model. 2024 Aug 26;64(16):6324-6337. doi: 10.1021/acs.jcim.4c00639. Epub 2024 Aug 7.
Predicting absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of small molecules is a key task in drug discovery. A major challenge in building better ADMET models is the experimental error inherent in the data. Furthermore, ADMET predictors are typically regression tasks due to the continuous nature of the data, which makes it difficult to apply existing denoising methods from other domains as they largely focus on classification tasks. Here, we develop denoising schemes based on deep learning to address this. We find that the training error (TE) can be used to identify the noise in regression tasks while ensemble-based and forgotten event-based metrics fail to detect the noise. The most significant performance increase occurs when the original model is finetuned with the denoised data using TE as the noise detection metric. Our method has the ability to improve models with medium noise and does not degrade the performance of models with noise outside this range (low noise and high noise regimes). To our knowledge, our denoising scheme is the first to improve model performance for ADMET data and has implications for improving models for experimental assay data in general.
预测小分子的吸收、分布、代谢、排泄和毒性(ADMET)特性是药物发现的关键任务。在构建更好的 ADMET 模型方面的一个主要挑战是数据中固有的实验误差。此外,由于数据的连续性,ADMET 预测器通常是回归任务,这使得难以应用其他领域的现有去噪方法,因为它们主要侧重于分类任务。在这里,我们开发了基于深度学习的去噪方案来解决这个问题。我们发现,训练误差(TE)可用于识别回归任务中的噪声,而基于集成和遗忘事件的指标无法检测到噪声。当使用 TE 作为噪声检测指标,使用原始数据和去噪数据对原始模型进行微调时,性能会有最大的提高。我们的方法能够提高中等噪声模型的性能,并且不会降低噪声超出此范围(低噪声和高噪声范围)的模型的性能。据我们所知,我们的去噪方案是第一个提高 ADMET 数据模型性能的方案,对提高实验检测数据的模型性能具有普遍意义。