CellCODE：一种用于异质细胞群体差异表达分析的稳健潜在变量方法。

CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations.

作者信息

Chikina Maria, Zaslavsky Elena, Sealfon Stuart C

机构信息

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15217, USA and Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

出版信息

Bioinformatics. 2015 May 15;31(10):1584-91. doi: 10.1093/bioinformatics/btv015. Epub 2015 Jan 11.

DOI:10.1093/bioinformatics/btv015

PMID:25583121

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4426841/

Abstract

MOTIVATION

Identifying alterations in gene expression associated with different clinical states is important for the study of human biology. However, clinical samples used in gene expression studies are often derived from heterogeneous mixtures with variable cell-type composition, complicating statistical analysis. Considerable effort has been devoted to modeling sample heterogeneity, and presently, there are many methods that can estimate cell proportions or pure cell-type expression from mixture data. However, there is no method that comprehensively addresses mixture analysis in the context of differential expression without relying on additional proportion information, which can be inaccurate and is frequently unavailable.

RESULTS

In this study, we consider a clinically relevant situation where neither accurate proportion estimates nor pure cell expression is of direct interest, but where we are rather interested in detecting and interpreting relevant differential expression in mixture samples. We develop a method, Cell-type COmputational Differential Estimation (CellCODE), that addresses the specific statistical question directly, without requiring a physical model for mixture components. Our approach is based on latent variable analysis and is computationally transparent; it requires no additional experimental data, yet outperforms existing methods that use independent proportion measurements. CellCODE has few parameters that are robust and easy to interpret. The method can be used to track changes in proportion, improve power to detect differential expression and assign the differentially expressed genes to the correct cell type.

摘要

动机

识别与不同临床状态相关的基因表达变化对于人类生物学研究至关重要。然而，基因表达研究中使用的临床样本通常来自细胞类型组成各异的异质混合物，这使得统计分析变得复杂。人们已投入大量精力对样本异质性进行建模，目前有许多方法可以从混合数据中估计细胞比例或纯细胞类型的表达。然而，尚无一种方法能在不依赖额外比例信息（这种信息可能不准确且常常无法获得）的情况下，全面解决差异表达背景下的混合分析问题。

结果

在本研究中，我们考虑一种临床相关情况，即准确的比例估计和纯细胞表达都不是直接关注的重点，而我们更感兴趣的是检测和解释混合样本中的相关差异表达。我们开发了一种方法，即细胞类型计算差异估计法（CellCODE），该方法直接解决特定的统计问题，无需混合成分的物理模型。我们的方法基于潜在变量分析，计算过程透明；它不需要额外的实验数据，但性能优于使用独立比例测量的现有方法。CellCODE的参数很少，且稳健易解释。该方法可用于追踪比例变化、提高检测差异表达的能力，并将差异表达基因分配到正确的细胞类型。

相似文献

CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations.CellCODE：一种用于异质细胞群体差异表达分析的稳健潜在变量方法。

Bioinformatics. 2015 May 15;31(10):1584-91. doi: 10.1093/bioinformatics/btv015. Epub 2015 Jan 11.

In silico microdissection of microarray data from heterogeneous cell populations.对来自异质细胞群体的微阵列数据进行计算机模拟显微切割。

BMC Bioinformatics. 2005 Mar 14;6:54. doi: 10.1186/1471-2105-6-54.

Differential correlation for sequencing data.测序数据的差异相关性

BMC Res Notes. 2017 Jan 19;10(1):54. doi: 10.1186/s13104-016-2331-9.

A statistical approach for identifying differential distributions in single-cell RNA-seq experiments.一种用于识别单细胞RNA测序实验中差异分布的统计方法。

Genome Biol. 2016 Oct 25;17(1):222. doi: 10.1186/s13059-016-1077-y.

rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.rSeqNP：一种用于从RNA测序数据中检测差异表达和剪接的非参数方法。

Bioinformatics. 2015 Jul 1;31(13):2222-4. doi: 10.1093/bioinformatics/btv119. Epub 2015 Feb 24.

EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments.EBSeq-HMM：一种用于在有序RNA测序实验中识别基因表达变化的贝叶斯方法。

Bioinformatics. 2015 Aug 15;31(16):2614-22. doi: 10.1093/bioinformatics/btv193. Epub 2015 Apr 5.

A fuzzy method for RNA-Seq differential expression analysis in presence of multireads.一种用于存在多重读取情况下RNA测序差异表达分析的模糊方法。

BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):345. doi: 10.1186/s12859-016-1195-2.

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.SPARTA：用于基于参考的细菌RNA测序转录组自动分析的简单程序。

BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.

Polyester: simulating RNA-seq datasets with differential transcript expression.聚酯：模拟具有差异转录本表达的RNA测序数据集。

Bioinformatics. 2015 Sep 1;31(17):2778-84. doi: 10.1093/bioinformatics/btv272. Epub 2015 Apr 28.

Modeling overdispersion heterogeneity in differential expression analysis using mixtures.在差异表达分析中使用混合模型对过度离散异质性进行建模。

Biometrics. 2016 Sep;72(3):804-14. doi: 10.1111/biom.12458. Epub 2015 Dec 18.

引用本文的文献

Mepolizumab alters gene regulatory networks of nasal airway type-2 and epithelial inflammation in urban children with asthma.美泊利珠单抗改变城市哮喘儿童鼻气道2型和上皮炎症的基因调控网络。

Nat Commun. 2025 Sep 2;16(1):8191. doi: 10.1038/s41467-025-63629-2.

An augmented GSNMF model for complete deconvolution of bulk RNA-seq data.用于批量RNA测序数据完全反卷积的增强型广义非负矩阵分解模型。

Math Biosci Eng. 2025 Mar 14;22(4):988-1018. doi: 10.3934/mbe.2025036.

The relationship between social adversity, micro-RNA expression and post-traumatic stress in a prospective, community-based cohort.一项基于社区的前瞻性队列研究中社会逆境、微小RNA表达与创伤后应激之间的关系。

Res Sq. 2025 Mar 17:rs.3.rs-5867503. doi: 10.21203/rs.3.rs-5867503/v1.

Multi Layered Omics Approaches Reveal Glia Specific Alterations in Alzheimer's Disease: A Systematic Review and Future Prospects.多层组学方法揭示阿尔茨海默病中神经胶质细胞的特异性改变：系统综述与未来展望

Glia. 2025 Mar;73(3):539-573. doi: 10.1002/glia.24652. Epub 2024 Dec 9.

Embracing the informative missingness and silent gene in analyzing biologically diverse samples.分析具有生物多样性样本时要包容信息缺失和沉默基因。

Sci Rep. 2024 Nov 16;14(1):28265. doi: 10.1038/s41598-024-78076-0.

Rapid iPSC inclusionopathy models shed light on formation, consequence, and molecular subtype of α-synuclein inclusions.快速 iPSC 包涵体模型揭示了 α-突触核蛋白包涵体的形成、后果和分子亚型。

Neuron. 2024 Sep 4;112(17):2886-2909.e16. doi: 10.1016/j.neuron.2024.06.002. Epub 2024 Jul 29.

ABDS: a bioinformatics tool suite for analyzing biologically diverse samples.ABDS：一个用于分析生物多样性样本的生物信息学工具套件。

Res Sq. 2024 May 30:rs.3.rs-4419408. doi: 10.21203/rs.3.rs-4419408/v1.

Alzheimer's disease rewires gene coexpression networks coupling different brain regions.阿尔茨海默病重塑基因共表达网络，连接不同的大脑区域。

NPJ Syst Biol Appl. 2024 May 9;10(1):50. doi: 10.1038/s41540-024-00376-y.

Distinctive whole-brain cell types predict tissue damage patterns in thirteen neurodegenerative conditions.独特的全脑细胞类型可预测 13 种神经退行性疾病的组织损伤模式。

Elife. 2024 Mar 21;12:RP89368. doi: 10.7554/eLife.89368.

Coexpression network analysis of the adult brain sheds light on the pathogenic mechanism of DDR1 in schizophrenia and bipolar disorder.成人脑的共表达网络分析揭示了 DDR1 在精神分裂症和双相情感障碍中的致病机制。

Transl Psychiatry. 2024 Feb 23;14(1):112. doi: 10.1038/s41398-024-02823-0.

本文引用的文献

Computational deconvolution: extracting cell type-specific information from heterogeneous samples.计算去卷积：从异质样本中提取细胞类型特异性信息。

Curr Opin Immunol. 2013 Oct;25(5):571-8. doi: 10.1016/j.coi.2013.09.015. Epub 2013 Oct 19.

Inferring tumour purity and stromal and immune cell admixture from expression data.从表达数据推断肿瘤纯度以及基质和免疫细胞的混合物。

Nat Commun. 2013;4:2612. doi: 10.1038/ncomms3612.

Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.通过使用先验知识建模隐藏协变量对 RNA-seq 数据进行标准化。

PLoS One. 2013 Jul 18;8(7):e68141. doi: 10.1371/journal.pone.0068141. Print 2013.

CellMix: a comprehensive toolbox for gene expression deconvolution.CellMix：一个全面的基因表达解卷积工具包。

Bioinformatics. 2013 Sep 1;29(17):2211-2. doi: 10.1093/bioinformatics/btt351. Epub 2013 Jul 3.

Dynamic regulation of epigenomic landscapes during hematopoiesis.造血过程中表观基因组景观的动态调控。

BMC Genomics. 2013 Mar 19;14:193. doi: 10.1186/1471-2164-14-193.

Heterogeneity in white blood cells has potential to confound DNA methylation measurements.白细胞异质性有可能使 DNA 甲基化测量产生混淆。

PLoS One. 2012;7(10):e46705. doi: 10.1371/journal.pone.0046705. Epub 2012 Oct 5.

Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain.人群特异性表达分析（PSEA）揭示了病变大脑中的分子变化。

Nat Methods. 2011 Oct 9;8(11):945-7. doi: 10.1038/nmeth.1710.

Systems biology of vaccination for seasonal influenza in humans.人类季节性流感疫苗接种的系统生物学。

Nat Immunol. 2011 Jul 10;12(8):786-95. doi: 10.1038/ni.2067.

Cell subset prediction for blood genomic studies.血液基因组研究的细胞亚群预测。

BMC Bioinformatics. 2011 Jun 24;12:258. doi: 10.1186/1471-2105-12-258.

Densely interconnected transcriptional circuits control cell states in human hematopoiesis.高度互联的转录电路控制着人类造血中的细胞状态。

Cell. 2011 Jan 21;144(2):296-309. doi: 10.1016/j.cell.2011.01.004.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验