Aben Nanne, Vis Daniel J, Michaut Magali, Wessels Lodewyk F A
Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam 1066CX, The Netherlands, Faculty of EEMCS, Delft University of Technology, Delft 2628CD, The Netherlands.
Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam 1066CX, The Netherlands.
Bioinformatics. 2016 Sep 1;32(17):i413-i420. doi: 10.1093/bioinformatics/btw449.
Clinical response to anti-cancer drugs varies between patients. A large portion of this variation can be explained by differences in molecular features, such as mutation status, copy number alterations, methylation and gene expression profiles. We show that the classic approach for combining these molecular features (Elastic Net regression on all molecular features simultaneously) results in models that are almost exclusively based on gene expression. The gene expression features selected by the classic approach are difficult to interpret as they often represent poorly studied combinations of genes, activated by aberrations in upstream signaling pathways.
To utilize all data types in a more balanced way, we developed TANDEM, a two-stage approach in which the first stage explains response using upstream features (mutations, copy number, methylation and cancer type) and the second stage explains the remainder using downstream features (gene expression). Applying TANDEM to 934 cell lines profiled across 265 drugs (GDSC1000), we show that the resulting models are more interpretable, while retaining the same predictive performance as the classic approach. Using the more balanced contributions per data type as determined with TANDEM, we find that response to MAPK pathway inhibitors is largely predicted by mutation data, while predicting response to DNA damaging agents requires gene expression data, in particular SLFN11 expression.
TANDEM is available as an R package on CRAN (for more information, see http://ccb.nki.nl/software/tandem).
m.michaut@nki.nl or l.wessels@nki.nl
Supplementary data are available at Bioinformatics online.
抗癌药物的临床反应在患者之间存在差异。这种差异的很大一部分可以通过分子特征的差异来解释,如突变状态、拷贝数改变、甲基化和基因表达谱。我们表明,将这些分子特征结合起来的经典方法(对所有分子特征同时进行弹性网络回归)会产生几乎完全基于基因表达的模型。经典方法选择的基因表达特征难以解释,因为它们通常代表由上游信号通路异常激活的、研究较少的基因组合。
为了更均衡地利用所有数据类型,我们开发了TANDEM,这是一种两阶段方法,其中第一阶段使用上游特征(突变、拷贝数、甲基化和癌症类型)来解释反应,第二阶段使用下游特征(基因表达)来解释剩余部分。将TANDEM应用于对265种药物进行分析的934个细胞系(GDSC1000),我们表明所得模型更具可解释性,同时保持与经典方法相同的预测性能。使用TANDEM确定的每种数据类型更均衡的贡献,我们发现对MAPK通路抑制剂的反应很大程度上由突变数据预测,而预测对DNA损伤剂的反应则需要基因表达数据,特别是SLFN11的表达。
TANDEM作为R包可在CRAN上获取(更多信息见http://ccb.nki.nl/software/tandem)。
m.michaut@nki.nl或l.wessels@nki.nl
补充数据可在《生物信息学》在线获取。