Chikina Maria, Zaslavsky Elena, Sealfon Stuart C
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15217, USA and Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Bioinformatics. 2015 May 15;31(10):1584-91. doi: 10.1093/bioinformatics/btv015. Epub 2015 Jan 11.
Identifying alterations in gene expression associated with different clinical states is important for the study of human biology. However, clinical samples used in gene expression studies are often derived from heterogeneous mixtures with variable cell-type composition, complicating statistical analysis. Considerable effort has been devoted to modeling sample heterogeneity, and presently, there are many methods that can estimate cell proportions or pure cell-type expression from mixture data. However, there is no method that comprehensively addresses mixture analysis in the context of differential expression without relying on additional proportion information, which can be inaccurate and is frequently unavailable.
In this study, we consider a clinically relevant situation where neither accurate proportion estimates nor pure cell expression is of direct interest, but where we are rather interested in detecting and interpreting relevant differential expression in mixture samples. We develop a method, Cell-type COmputational Differential Estimation (CellCODE), that addresses the specific statistical question directly, without requiring a physical model for mixture components. Our approach is based on latent variable analysis and is computationally transparent; it requires no additional experimental data, yet outperforms existing methods that use independent proportion measurements. CellCODE has few parameters that are robust and easy to interpret. The method can be used to track changes in proportion, improve power to detect differential expression and assign the differentially expressed genes to the correct cell type.
识别与不同临床状态相关的基因表达变化对于人类生物学研究至关重要。然而,基因表达研究中使用的临床样本通常来自细胞类型组成各异的异质混合物,这使得统计分析变得复杂。人们已投入大量精力对样本异质性进行建模,目前有许多方法可以从混合数据中估计细胞比例或纯细胞类型的表达。然而,尚无一种方法能在不依赖额外比例信息(这种信息可能不准确且常常无法获得)的情况下,全面解决差异表达背景下的混合分析问题。
在本研究中,我们考虑一种临床相关情况,即准确的比例估计和纯细胞表达都不是直接关注的重点,而我们更感兴趣的是检测和解释混合样本中的相关差异表达。我们开发了一种方法,即细胞类型计算差异估计法(CellCODE),该方法直接解决特定的统计问题,无需混合成分的物理模型。我们的方法基于潜在变量分析,计算过程透明;它不需要额外的实验数据,但性能优于使用独立比例测量的现有方法。CellCODE的参数很少,且稳健易解释。该方法可用于追踪比例变化、提高检测差异表达的能力,并将差异表达基因分配到正确的细胞类型。