Codicè Francesco, Pancotti Corrado, Rollo Cesare, Moreau Yves, Fariselli Piero, Raimondi Daniele
Department of Medical Sciences, University of Torino, 10123, Torino, Italy.
ESAT-STADIUS, KU Leuven, Leuven, 3001, Belgium.
J Cheminform. 2025 Mar 14;17(1):33. doi: 10.1186/s13321-025-00972-y.
Precision oncology plays a pivotal role in contemporary healthcare, aiming to optimize treatments for each patient based on their unique characteristics. This objective has spurred the emergence of various cancer cell line drug response datasets, driven by the need to facilitate pre-clinical studies by exploring the impact of multi-omics data on drug response. Despite the proliferation of machine learning models for Drug Response Prediction (DRP), their validation remains critical to reliably assess their usefulness for drug discovery, precision oncology and their actual ability to generalize over the immense space of cancer cells and chemical compounds. Scientific contribution In this paper we show that the commonly used evaluation strategies for DRP methods can be easily fooled by commonly occurring dataset biases, and they are therefore not able to truly measure the ability of DRP methods to generalize over drugs and cell lines ("specification gaming"). This problem hinders the development of reliable DRP methods and their application to experimental pipelines. Here we propose a new validation protocol composed by three Aggregation Strategies (Global, Fixed-Drug, and Fixed-Cell Line) integrating them with three of the most commonly used train-test evaluation settings, to ensure a truly realistic assessment of the prediction performance. We also scrutinize the challenges associated with using IC50 as a prediction label, showing how its close correlation with the drug concentration ranges worsens the risk of misleading performance assessment, and we indicate an additional reason to replace it with the Area Under the Dose-Response Curve instead.
精准肿瘤学在当代医疗保健中发挥着关键作用,旨在根据每位患者的独特特征优化治疗方案。这一目标促使各种癌细胞系药物反应数据集的出现,其驱动力在于需要通过探索多组学数据对药物反应的影响来促进临床前研究。尽管用于药物反应预测(DRP)的机器学习模型大量涌现,但其验证对于可靠评估它们在药物发现、精准肿瘤学方面的实用性以及它们在癌细胞和化合物的巨大空间中进行泛化的实际能力仍然至关重要。科学贡献 在本文中,我们表明DRP方法常用的评估策略很容易被常见的数据集偏差所误导,因此它们无法真正衡量DRP方法在药物和细胞系上进行泛化的能力(“规格博弈”)。这个问题阻碍了可靠的DRP方法的开发及其在实验流程中的应用。在此,我们提出一种新的验证协议,该协议由三种聚合策略(全局、固定药物和固定细胞系)组成,并将它们与三种最常用的训练 - 测试评估设置相结合,以确保对预测性能进行真正现实的评估。我们还仔细研究了将IC50用作预测标签所带来的挑战,展示了其与药物浓度范围的紧密相关性如何加剧了误导性能评估的风险,并且我们指出了用剂量 - 反应曲线下面积取而代之的另一个原因。