Malik Vidhi, Kalakoti Yogesh, Sundar Durai
DAILAB, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi, India.
BMC Genomics. 2021 Mar 24;22(1):214. doi: 10.1186/s12864-021-07524-2.
Survival and drug response are two highly emphasized clinical outcomes in cancer research that directs the prognosis of a cancer patient. Here, we have proposed a late multi omics integrative framework that robustly quantifies survival and drug response for breast cancer patients with a focus on the relative predictive ability of available omics datatypes. Neighborhood component analysis (NCA), a supervised feature selection algorithm selected relevant features from multi-omics datasets retrieved from The Cancer Genome Atlas (TCGA) and Genomics of Drug Sensitivity in Cancer (GDSC) databases. A Neural network framework, fed with NCA selected features, was used to develop survival and drug response prediction models for breast cancer patients. The drug response framework used regression and unsupervised clustering (K-means) to segregate samples into responders and non-responders based on their predicted IC50 values (Z-score).
The survival prediction framework was highly effective in categorizing patients into risk subtypes with an accuracy of 94%. Compared to single-omics and early integration approaches, our drug response prediction models performed significantly better and were able to predict IC50 values (Z-score) with a mean square error (MSE) of 1.154 and an overall regression value of 0.92, showing a linear relationship between predicted and actual IC50 values.
The proposed omics integration strategy provides an effective way of extracting critical information from diverse omics data types enabling estimation of prognostic indicators. Such integrative models with high predictive power would have a significant impact and utility in precision oncology.
生存和药物反应是癌症研究中两个备受关注的临床结果,它们指导着癌症患者的预后。在此,我们提出了一个晚期多组学整合框架,该框架能够稳健地量化乳腺癌患者的生存和药物反应,重点关注可用组学数据类型的相对预测能力。邻域成分分析(NCA)是一种监督特征选择算法,从癌症基因组图谱(TCGA)和癌症药物敏感性基因组学(GDSC)数据库中检索的多组学数据集中选择相关特征。一个神经网络框架,输入NCA选择的特征,用于开发乳腺癌患者的生存和药物反应预测模型。药物反应框架使用回归和无监督聚类(K均值)根据预测的IC50值(Z分数)将样本分为反应者和无反应者。
生存预测框架在将患者分类为风险亚型方面非常有效,准确率为94%。与单组学和早期整合方法相比,我们的药物反应预测模型表现明显更好,能够预测IC50值(Z分数),均方误差(MSE)为 1.154,总体回归值为0.92,表明预测的IC50值与实际IC50值之间存在线性关系。
所提出的组学整合策略提供了一种从多种组学数据类型中提取关键信息的有效方法,从而能够估计预后指标。这种具有高预测能力的整合模型将在精准肿瘤学中产生重大影响并具有实用价值。