Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen, Germany.
Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen University, Aachen, Germany.
PLoS One. 2020 Nov 23;15(11):e0238961. doi: 10.1371/journal.pone.0238961. eCollection 2020.
Drug sensitivity prediction models for human cancer cell lines constitute important tools in identifying potential computational biomarkers for responsiveness in a pre-clinical setting. Integrating information derived from a range of heterogeneous data is crucial, but remains non-trivial, as differences in data structures may hinder fitting algorithms from assigning adequate weights to complementary information that is contained in distinct omics data. In order to counteract this effect that tends to lead to just one data type dominating supposedly multi-omics models, we developed a novel tool that enables users to train single-omics models separately in a first step and to integrate them into a multi-omics model in a second step. Extensive ablation studies are performed in order to facilitate an in-depth evaluation of the respective contributions of singular data types and of combinations thereof, effectively identifying redundancies and interdependencies between them. Moreover, the integration of the single-omics models is realized by a range of distinct classification algorithms, thus allowing for a performance comparison. Sets of molecular events and tissue types found to be related to significant shifts in drug sensitivity are returned to facilitate a comprehensive and straightforward analysis of potential computational biomarkers for drug responsiveness. Our two-step approach yields sets of actual multi-omics pan-cancer classification models that are highly predictive for a majority of drugs in the GDSC data base. In the context of targeted drugs with particular modes of action, its predictive performances compare favourably to those of classification models that incorporate multi-omics data in a simple one-step approach. Additionally, case studies demonstrate that it succeeds both in correctly identifying known key biomarkers for sensitivity towards specific drug compounds as well as in providing sets of potential candidates for additional computational biomarkers.
人类癌细胞系药物敏感性预测模型是识别临床前潜在计算生物标志物反应性的重要工具。整合来自各种异构数据的信息至关重要,但仍然具有挑战性,因为数据结构的差异可能会阻碍拟合算法为不同组学数据中包含的互补信息分配适当的权重。为了抵消这种倾向于导致仅有一种数据类型主导所谓的多组学模型的效应,我们开发了一种新工具,使用户能够在第一步中分别训练单组学模型,并在第二步中将它们集成到多组学模型中。我们进行了广泛的消融研究,以便深入评估各个单组学数据类型及其组合的贡献,有效地识别它们之间的冗余和相互依赖关系。此外,通过一系列不同的分类算法来实现单组学模型的集成,从而实现性能比较。返回与药物敏感性显著变化相关的分子事件和组织类型集,以促进对药物反应性潜在计算生物标志物的全面直接分析。我们的两步方法产生了一组实际的多组学泛癌症分类模型,这些模型对 GDSC 数据库中的大多数药物具有高度预测性。在具有特定作用模式的靶向药物的背景下,其预测性能与简单一步方法中整合多组学数据的分类模型相当。此外,案例研究表明,它不仅能够正确识别针对特定药物化合物的已知关键敏感性生物标志物,还能够提供一组潜在的候选计算生物标志物。