Kamuntavičius Gintautas, Paquet Tanya, Bastas Orestis, Šalkauskas Dainius, Prat Alvaro, Aty Hisham Abdel, Pabrinkis Aurimas, Norvaišas Povilas, Tal Roy
AI Chemistry, Ro5, 2801 Gateway Drive, 75063, Irving, TX, USA.
J Cheminform. 2025 Jul 21;17(1):108. doi: 10.1186/s13321-025-01041-0.
This study, focusing on predicting Absorption, Distribution, Metabolism, Excretion, and Toxicology (ADMET) properties, addresses the key challenges of ML models trained using ligand-based representations. We propose a structured approach to data feature selection, taking a step beyond the conventional practice of combining different representations without systematic reasoning. Additionally, we enhance model evaluation methods by integrating cross-validation with statistical hypothesis testing, adding a layer of reliability to the model assessments. Our final evaluations include a practical scenario, where models trained on one source of data are evaluated on a different one. This approach aims to bolster the reliability of ADMET predictions, providing more dependable and informative model evaluations.Scientific contributionThis study provided a structured approach to feature selection. We improve model evaluation by combining cross-validation with statistical hypothesis testing, making results more reliable. The methodology used in our study can be generalized beyond feature selection, boosting the confidence in selected models which is crucial in a noisy domain such as the ADMET prediction tasks. Additionally, we assess how well models trained on one dataset perform on another, offering practical insights for using external data in drug discovery.
本研究聚焦于预测吸收、分布、代谢、排泄和毒理学(ADMET)特性,解决了使用基于配体的表示训练的机器学习模型的关键挑战。我们提出了一种结构化的数据特征选择方法,超越了在没有系统推理的情况下组合不同表示的传统做法。此外,我们通过将交叉验证与统计假设检验相结合来增强模型评估方法,为模型评估增加了一层可靠性。我们的最终评估包括一个实际场景,即在一个数据源上训练的模型在另一个数据源上进行评估。这种方法旨在提高ADMET预测的可靠性,提供更可靠和信息丰富的模型评估。
科学贡献
本研究提供了一种结构化的特征选择方法。我们通过将交叉验证与统计假设检验相结合来改进模型评估,使结果更可靠。我们研究中使用的方法可以推广到特征选择之外,增强对所选模型的信心,这在诸如ADMET预测任务这样的噪声领域至关重要。此外,我们评估了在一个数据集上训练的模型在另一个数据集上的表现,为在药物发现中使用外部数据提供了实际见解。