Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA.
Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, 78249, USA.
BMC Med Genomics. 2019 Jan 31;12(Suppl 1):18. doi: 10.1186/s12920-018-0460-9.
The study of high-throughput genomic profiles from a pharmacogenomics viewpoint has provided unprecedented insights into the oncogenic features modulating drug response. A recent study screened for the response of a thousand human cancer cell lines to a wide collection of anti-cancer drugs and illuminated the link between cellular genotypes and vulnerability. However, due to essential differences between cell lines and tumors, to date the translation into predicting drug response in tumors remains challenging. Recently, advances in deep learning have revolutionized bioinformatics and introduced new techniques to the integration of genomic data. Its application on pharmacogenomics may fill the gap between genomics and drug response and improve the prediction of drug response in tumors.
We proposed a deep learning model to predict drug response (DeepDR) based on mutation and expression profiles of a cancer cell or a tumor. The model contains three deep neural networks (DNNs), i) a mutation encoder pre-trained using a large pan-cancer dataset (The Cancer Genome Atlas; TCGA) to abstract core representations of high-dimension mutation data, ii) a pre-trained expression encoder, and iii) a drug response predictor network integrating the first two subnetworks. Given a pair of mutation and expression profiles, the model predicts IC values of 265 drugs. We trained and tested the model on a dataset of 622 cancer cell lines and achieved an overall prediction performance of mean squared error at 1.96 (log-scale IC values). The performance was superior in prediction error or stability than two classical methods (linear regression and support vector machine) and four analog DNN models of DeepDR, including DNNs built without TCGA pre-training, partly replaced by principal components, and built on individual types of input data. We then applied the model to predict drug response of 9059 tumors of 33 cancer types. Using per-cancer and pan-cancer settings, the model predicted both known, including EGFR inhibitors in non-small cell lung cancer and tamoxifen in ER+ breast cancer, and novel drug targets, such as vinorelbine for TTN-mutated tumors. The comprehensive analysis further revealed the molecular mechanisms underlying the resistance to a chemotherapeutic drug docetaxel in a pan-cancer setting and the anti-cancer potential of a novel agent, CX-5461, in treating gliomas and hematopoietic malignancies.
Here we present, as far as we know, the first DNN model to translate pharmacogenomics features identified from in vitro drug screening to predict the response of tumors. The results covered both well-studied and novel mechanisms of drug resistance and drug targets. Our model and findings improve the prediction of drug response and the identification of novel therapeutic options.
从药物基因组学的角度研究高通量基因组图谱为研究调节药物反应的致癌特征提供了前所未有的见解。最近的一项研究筛选了 1000 个人类癌细胞系对广泛的抗癌药物的反应,并阐明了细胞基因型与脆弱性之间的联系。然而,由于细胞系和肿瘤之间存在本质上的差异,迄今为止,将其转化为预测肿瘤中的药物反应仍然具有挑战性。最近,深度学习的进步彻底改变了生物信息学,并为整合基因组数据引入了新技术。将其应用于药物基因组学可能会填补基因组学和药物反应之间的空白,并提高对肿瘤中药物反应的预测。
我们提出了一种基于癌细胞或肿瘤的突变和表达谱来预测药物反应(DeepDR)的深度学习模型。该模型包含三个深度神经网络(DNN):i)使用大型泛癌数据集(癌症基因组图谱;TCGA)预先训练的突变编码器,用于抽象高维突变数据的核心表示,ii)预先训练的表达编码器,以及 iii)一个整合前两个子网的药物反应预测网络。给定一对突变和表达谱,该模型预测了 265 种药物的 IC 值。我们在包含 622 个癌细胞系的数据集上进行了训练和测试,在均方误差方面达到了 1.96 的整体预测性能(对数标度 IC 值)。该性能在预测误差或稳定性方面优于两种经典方法(线性回归和支持向量机)和 DeepDR 的四个类似 DNN 模型,包括未经过 TCGA 预训练、部分用主成分替代以及基于单个输入数据类型构建的 DNN。然后,我们将该模型应用于预测 33 种癌症类型的 9059 个肿瘤的药物反应。使用每癌和泛癌设置,该模型预测了已知的药物靶点,包括非小细胞肺癌中的 EGFR 抑制剂和 ER+乳腺癌中的他莫昔芬,以及新的药物靶点,如长春瑞滨用于 TTN 突变的肿瘤。综合分析进一步揭示了在泛癌环境中 docetaxel 化疗药物耐药的分子机制以及新型药物 CX-5461 治疗神经胶质瘤和血液恶性肿瘤的抗癌潜力。
据我们所知,这是第一个将从体外药物筛选中鉴定出的药物基因组学特征转化为预测肿瘤反应的 DNN 模型。结果涵盖了耐药和药物靶点的研究充分和新颖的机制。我们的模型和研究结果提高了药物反应的预测和新治疗选择的识别。