Yang Chenyu, Liu Zhenhao, Dai Peibin, Zhang Yu, Huang Pengjie, Lin Yong, Xie Lu
School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China.
Institute for Genome and Bioinformatics, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai 201203, China.
Sheng Wu Gong Cheng Xue Bao. 2022 Jun 25;38(6):2201-2212. doi: 10.13345/j.cjb.210676.
The prediction of tumor drug sensitivity plays an important role in clinically guiding patients' medication. In this paper, a multi-omics data-based cancer drug sensitivity prediction model was constructed by Stacking ensemble learning method. The data including gene expression, mutation, copy number variation and drug sensitivity value () of 198 drugs were downloaded from the GDSC database. Multiple feature selection methods were applied for dimensionality reduction. Six primary learners and one secondary learner were integrated into modeling by Stacking method. The model was validated with 5-fold cross-validation. In the prediction results, 36.4% of drug models' AUCs were greater than 0.9, 49.0% of drug models' AUCs were between 0.8-0.9, and the lowest drug model's AUC was 0.682. The multi-omics model for drug sensitivity prediction based on Stacking method is better than the known single-omics or multi-omics model in terms of accuracy and stability. The model based on multi-omics data is better than the single-omics data in predicting drug sensitivity. Function annotation and enrichment analysis of feature genes revealed the potential resistance mechanism of tumors to sorafenib, providing the model interpretability from a biological perspective, and demonstrated the model's potential applicability in clinical medication guidance.
肿瘤药物敏感性预测在临床指导患者用药方面发挥着重要作用。本文采用堆叠集成学习方法构建了基于多组学数据的癌症药物敏感性预测模型。从GDSC数据库下载了包括198种药物的基因表达、突变、拷贝数变异和药物敏感性值()在内的数据。应用多种特征选择方法进行降维。通过堆叠方法将六个初级学习器和一个次级学习器集成到建模中。该模型采用五折交叉验证进行验证。在预测结果中,36.4%的药物模型的AUC大于0.9,49.0%的药物模型的AUC在0.8 - 0.9之间,最低药物模型的AUC为0.682。基于堆叠方法的药物敏感性预测多组学模型在准确性和稳定性方面优于已知的单组学或多组学模型。基于多组学数据的模型在预测药物敏感性方面优于单组学数据。对特征基因的功能注释和富集分析揭示了肿瘤对索拉非尼的潜在耐药机制,从生物学角度提供了模型可解释性,并证明了该模型在临床用药指导中的潜在适用性。