Covell David G
Information Technology Branch, Developmental Therapeutics Program, National Cancer Institute, Frederick, MD, United States of America.
PLoS One. 2017 Aug 8;12(8):e0181991. doi: 10.1371/journal.pone.0181991. eCollection 2017.
A novel data mining procedure is proposed for identifying potential pathway-gene biomarkers from preclinical drug sensitivity data for predicting clinical responses to erlotinib or sorafenib. The analysis applies linear ridge regression modeling to generate a small (N1000) set of baseline gene expressions that jointly yield quality predictions of preclinical drug sensitivity data and clinical responses. Standard clustering of the pathway-gene combinations from gene set enrichment analysis of this initial gene set, according to their shared appearance in molecular function pathways, yields a reduced (N300) set of potential pathway-gene biomarkers. A modified method for quantifying pathway fitness is used to determine smaller numbers of over and under expressed genes that correspond with favorable and unfavorable clinical responses. Detailed literature-based evidence is provided in support of the roles of these under and over expressed genes in compound efficacy. RandomForest analysis of potential pathway-gene biomarkers finds average treatment prediction errors of 10% and 22%, respectively, for patients receiving erlotinib or sorafenib that had a favorable clinical response. Higher errors were found for both compounds when predicting an unfavorable clinical response. Collectively these results suggest complementary roles for biomarker genes and biomarker pathways when predicting clinical responses from preclinical data.
提出了一种新的数据挖掘程序,用于从临床前药物敏感性数据中识别潜在的通路-基因生物标志物,以预测对厄洛替尼或索拉非尼的临床反应。该分析应用线性岭回归模型生成一小套(N1000)基线基因表达,这些表达共同产生临床前药物敏感性数据和临床反应的高质量预测。根据该初始基因集在分子功能通路中的共同出现情况,对来自基因集富集分析的通路-基因组合进行标准聚类,得到一套数量减少(N300)的潜在通路-基因生物标志物。一种改进的量化通路适应性的方法用于确定与有利和不利临床反应相对应的过表达和低表达基因的较少数量。提供了基于详细文献的证据,以支持这些低表达和过表达基因在化合物疗效中的作用。对潜在通路-基因生物标志物的随机森林分析发现,接受厄洛替尼或索拉非尼治疗且临床反应良好的患者的平均治疗预测误差分别为10%和22%。在预测不利临床反应时,两种化合物的误差都更高。总体而言,这些结果表明,在从临床前数据预测临床反应时,生物标志物基因和生物标志物通路具有互补作用。