College of Information Technology, Shanghai Ocean University, Shanghai, China.
Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China.
PLoS Comput Biol. 2020 Nov 30;16(11):e1008452. doi: 10.1371/journal.pcbi.1008452. eCollection 2020 Nov.
Deconvolution of heterogeneous bulk tumor samples into distinct cellular populations is an important yet challenging problem, particularly when only partial references are available. A common approach to dealing with this problem is to deconvolve the mixed signals using available references and leverage the remaining signal as a new cell component. However, as indicated in our simulation, such an approach tends to over-estimate the proportions of known cell types and fails to detect novel cell types. Here, we propose PREDE, a partial reference-based deconvolution method using an iterative non-negative matrix factorization algorithm. Our method is verified to be effective in estimating cell proportions and expression profiles of unknown cell types based on simulated datasets at a variety of parameter settings. Applying our method to TCGA tumor samples, we found that proportions of pure cancer cells better indicate different subtypes of tumor samples. We also detected several cell types for each cancer type whose proportions successfully predicted patient survival. Our method makes a significant contribution to deconvolution of heterogeneous tumor samples and could be widely applied to varieties of high throughput bulk data. PREDE is implemented in R and is freely available from GitHub (https://xiaoqizheng.github.io/PREDE).
对异质的肿瘤样本进行分解为不同的细胞群体是一个重要但具有挑战性的问题,特别是当只有部分参考资料可用时。解决这个问题的一种常见方法是使用可用的参考资料对混合信号进行反卷积,并利用剩余的信号作为新的细胞成分。然而,正如我们的模拟所示,这种方法往往会高估已知细胞类型的比例,并且无法检测到新的细胞类型。在这里,我们提出了 PREDE,一种基于部分参考的反卷积方法,使用迭代非负矩阵分解算法。我们的方法在各种参数设置下的模拟数据集上验证了其在估计未知细胞类型的细胞比例和表达谱方面的有效性。将我们的方法应用于 TCGA 肿瘤样本,我们发现纯癌细胞的比例更好地指示了肿瘤样本的不同亚型。我们还检测到每个癌症类型的几种细胞类型,其比例成功预测了患者的生存情况。我们的方法对异质肿瘤样本的反卷积有重要贡献,并可广泛应用于各种高通量的批量数据。PREDE 是用 R 编写的,并可从 GitHub(https://xiaoqizheng.github.io/PREDE)上免费获得。