Department of Bioinformatics and Computational Biology and Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX 77030, USA.
Bioinformatics. 2013 Aug 1;29(15):1865-71. doi: 10.1093/bioinformatics/btt301. Epub 2013 May 27.
Tissue samples of tumor cells mixed with stromal cells cause underdetection of gene expression signatures associated with cancer prognosis or response to treatment. In silico dissection of mixed cell samples is essential for analyzing expression data generated in cancer studies. Currently, a systematic approach is lacking to address three challenges in computational deconvolution: (i) violation of linear addition of expression levels from multiple tissues when log-transformed microarray data are used; (ii) estimation of both tumor proportion and tumor-specific expression, when neither is known a priori; and (iii) estimation of expression profiles for individual patients.
We have developed a statistical method for deconvolving mixed cancer transcriptomes, DeMix, which addresses the aforementioned issues in array-based expression data. We demonstrate the performance of our model in synthetic and real, publicly available, datasets. DeMix can be applied to ongoing biomarker-based clinical studies and to the vast expression datasets previously generated from mixed tumor and stromal cell samples.
All codes are written in C and integrated into an R function, which is available at http://odin.mdacc.tmc.edu/∼wwang7/DeMix.html.
Supplementary data are available at Bioinformatics online.
肿瘤细胞与基质细胞混合的组织样本会导致与癌症预后或治疗反应相关的基因表达特征检测不足。对混合细胞样本进行计算分析对于分析癌症研究中生成的表达数据至关重要。目前,在计算去卷积方面,缺乏一种系统的方法来解决三个挑战:(i)当使用对数转换的微阵列数据时,违反了来自多个组织的表达水平的线性相加;(ii)当事先不知道肿瘤比例和肿瘤特异性表达时,对其进行估计;(iii)估计个体患者的表达谱。
我们开发了一种用于去卷积混合癌症转录组的统计方法 DeMix,该方法解决了基于阵列的表达数据中上述问题。我们在合成数据集和真实的、公开可用的数据集上展示了我们模型的性能。DeMix 可以应用于正在进行的基于生物标志物的临床研究,以及以前从混合肿瘤和基质细胞样本中生成的大量表达数据集。
所有代码均用 C 编写,并集成到一个 R 函数中,可在 http://odin.mdacc.tmc.edu/∼wwang7/DeMix.html 获得。
补充数据可在生物信息学在线获得。