Suppr超能文献

串联法:一种基于多种分子数据类型最大化药物反应模型可解释性的两阶段方法。

TANDEM: a two-stage approach to maximize interpretability of drug response models based on multiple molecular data types.

作者信息

Aben Nanne, Vis Daniel J, Michaut Magali, Wessels Lodewyk F A

机构信息

Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam 1066CX, The Netherlands, Faculty of EEMCS, Delft University of Technology, Delft 2628CD, The Netherlands.

Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam 1066CX, The Netherlands.

出版信息

Bioinformatics. 2016 Sep 1;32(17):i413-i420. doi: 10.1093/bioinformatics/btw449.

Abstract

MOTIVATION

Clinical response to anti-cancer drugs varies between patients. A large portion of this variation can be explained by differences in molecular features, such as mutation status, copy number alterations, methylation and gene expression profiles. We show that the classic approach for combining these molecular features (Elastic Net regression on all molecular features simultaneously) results in models that are almost exclusively based on gene expression. The gene expression features selected by the classic approach are difficult to interpret as they often represent poorly studied combinations of genes, activated by aberrations in upstream signaling pathways.

RESULTS

To utilize all data types in a more balanced way, we developed TANDEM, a two-stage approach in which the first stage explains response using upstream features (mutations, copy number, methylation and cancer type) and the second stage explains the remainder using downstream features (gene expression). Applying TANDEM to 934 cell lines profiled across 265 drugs (GDSC1000), we show that the resulting models are more interpretable, while retaining the same predictive performance as the classic approach. Using the more balanced contributions per data type as determined with TANDEM, we find that response to MAPK pathway inhibitors is largely predicted by mutation data, while predicting response to DNA damaging agents requires gene expression data, in particular SLFN11 expression.

AVAILABILITY AND IMPLEMENTATION

TANDEM is available as an R package on CRAN (for more information, see http://ccb.nki.nl/software/tandem).

CONTACT

m.michaut@nki.nl or l.wessels@nki.nl

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

抗癌药物的临床反应在患者之间存在差异。这种差异的很大一部分可以通过分子特征的差异来解释,如突变状态、拷贝数改变、甲基化和基因表达谱。我们表明,将这些分子特征结合起来的经典方法(对所有分子特征同时进行弹性网络回归)会产生几乎完全基于基因表达的模型。经典方法选择的基因表达特征难以解释,因为它们通常代表由上游信号通路异常激活的、研究较少的基因组合。

结果

为了更均衡地利用所有数据类型,我们开发了TANDEM,这是一种两阶段方法,其中第一阶段使用上游特征(突变、拷贝数、甲基化和癌症类型)来解释反应,第二阶段使用下游特征(基因表达)来解释剩余部分。将TANDEM应用于对265种药物进行分析的934个细胞系(GDSC1000),我们表明所得模型更具可解释性,同时保持与经典方法相同的预测性能。使用TANDEM确定的每种数据类型更均衡的贡献,我们发现对MAPK通路抑制剂的反应很大程度上由突变数据预测,而预测对DNA损伤剂的反应则需要基因表达数据,特别是SLFN11的表达。

可用性和实现

TANDEM作为R包可在CRAN上获取(更多信息见http://ccb.nki.nl/software/tandem)。

联系方式

m.michaut@nki.nll.wessels@nki.nl

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验