Department of Life Sciences and Medicine, University of Luxembourg, 2, place de l'Université, L4365 Esch-sur-Alzette, Luxembourg.
Experimental Dermatology, Department of Dermatology, Technische Universität-Dresden, 01307 Dresden, Germany.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae567.
Stratification of patients diagnosed with cancer has become a major goal in personalized oncology. One important aspect is the accurate prediction of the response to various drugs. It is expected that the molecular characteristics of the cancer cells contain enough information to retrieve specific signatures, allowing for accurate predictions based solely on these multi-omic data. Ideally, these predictions should be explainable to clinicians, in order to be integrated in the patients care. We propose a machine-learning framework based on ensemble learning to integrate multi-omic data and predict sensitivity to an array of commonly used and experimental compounds, including chemotoxic compounds and targeted kinase inhibitors. We trained a set of classifiers on the different parts of our dataset to produce omic-specific signatures, then trained a random forest classifier on these signatures to predict drug responsiveness. We used the Cancer Cell Line Encyclopedia dataset, comprising multi-omic and drug sensitivity measurements for hundreds of cell lines, to build the predictive models, and validated the results using nested cross-validation. Our results show good performance for several compounds (Area under the Receiver-Operating Curve >79%) across the most frequent cancer types. Furthermore, the simplicity of our approach allows to examine which omic layers have a greater importance in the models and identify new putative markers of drug responsiveness. We propose several models based on small subsets of transcriptional markers with the potential to become useful tools in personalized oncology, paving the way for clinicians to use the molecular characteristics of the tumors to predict sensitivity to therapeutic compounds.
对诊断为癌症的患者进行分层已成为个体化肿瘤学的主要目标。一个重要方面是准确预测对各种药物的反应。预计癌细胞的分子特征包含足够的信息来检索特定的特征,仅基于这些多组学数据就可以进行准确预测。理想情况下,这些预测应该可以向临床医生解释,以便将其整合到患者的护理中。我们提出了一个基于集成学习的机器学习框架,用于整合多组学数据并预测对一系列常用和实验性化合物的敏感性,包括化学毒性化合物和靶向激酶抑制剂。我们在数据集的不同部分训练了一组分类器以产生特定于组学的特征,然后在这些特征上训练随机森林分类器以预测药物反应性。我们使用癌症细胞系百科全书数据集,其中包含数百种细胞系的多组学和药物敏感性测量值,来构建预测模型,并使用嵌套交叉验证验证结果。我们的结果表明,对于几种化合物(接收器操作特征曲线下的面积> 79%),在最常见的癌症类型中表现出良好的性能。此外,我们方法的简单性允许检查哪些组学层在模型中具有更大的重要性,并确定新的潜在药物反应性标记物。我们提出了基于转录标记子集的几个模型,这些模型有可能成为个体化肿瘤学中的有用工具,为临床医生利用肿瘤的分子特征来预测对治疗化合物的敏感性铺平道路。