用于校正单细胞基因表达数据中技术变异的狄利克雷过程混合模型

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data.

作者信息

Prabhakaran Sandhya, Azizi Elham, Carr Ambrose, Pe'er Dana

机构信息

Departments of Biological Sciences, Systems Biology and Computer Science, Columbia University, New York, NY, USA.

出版信息

JMLR Workshop Conf Proc. 2016;48:1070-1079.

PMID:29928470

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6004614/

Abstract

We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

摘要

我们介绍了一种用于单细胞基因表达数据的迭代归一化和聚类方法。新兴的单细胞RNA测序技术能够获取数千个细胞的基因表达测量值，从而有助于发现和表征细胞类型。然而，数据受到实验误差和细胞类型特异性偏差所产生的技术变异的影响。当前方法在分析生物信号之前进行全局归一化，这无法解决缺失数据或依赖潜在细胞类型的变异问题。我们的模型被构建为一个具有细胞特异性缩放的分层贝叶斯混合模型，有助于细胞的迭代归一化和聚类，将技术变异与生物信号区分开来。我们证明这种方法优于先进行聚类再进行全局归一化的方法。我们展示了我们方法的可识别性和弱收敛保证，并提出了一种可扩展的吉布斯推理算法。与先前方法相比，该方法在合成和真实单细胞数据中都改进了聚类推理，并且能够轻松解释和恢复潜在结构及细胞类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3c5/6004614/2e30674eabd2/nihms972080f1.jpg

相似文献

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data.用于校正单细胞基因表达数据中技术变异的狄利克雷过程混合模型

JMLR Workshop Conf Proc. 2016;48:1070-1079.

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.DIMM-SC：一种基于 Dirichlet 混合模型的用于聚类基于液滴的单细胞转录组学数据的方法。

Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.

Clustering distributions with the marginalized nested Dirichlet process.使用边缘化嵌套狄利克雷过程进行聚类分布

Biometrics. 2018 Jun;74(2):584-594. doi: 10.1111/biom.12778. Epub 2017 Sep 28.

Expression analysis of RNA sequencing data from human neural and glial cell lines depends on technical replication and normalization methods.从人类神经和神经胶质细胞系的 RNA 测序数据的表达分析取决于技术复制和归一化方法。

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):412. doi: 10.1186/s12859-018-2382-0.

Hierarchical Dirichlet process model for gene expression clustering.用于基因表达聚类的分层狄利克雷过程模型

EURASIP J Bioinform Syst Biol. 2013 Apr 12;2013(1):5. doi: 10.1186/1687-4153-2013-5.

A COMPOSITIONAL MODEL TO ASSESS EXPRESSION CHANGES FROM SINGLE-CELL RNA-SEQ DATA.一种用于评估单细胞RNA测序数据中表达变化的组成模型。

Ann Appl Stat. 2021 Jun;15(2):880-901. doi: 10.1214/20-aoas1423. Epub 2021 Jul 12.

A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA.用于单细胞和批量RNA测序数据的统一统计框架

Ann Appl Stat. 2018 Mar;12(1):609-632. doi: 10.1214/17-AOAS1110. Epub 2018 Mar 9.

A BAYESIAN NONPARAMETRIC MODEL FOR INFERRING SUBCLONAL POPULATIONS FROM STRUCTURED DNA SEQUENCING DATA.一种用于从结构化DNA测序数据推断亚克隆群体的贝叶斯非参数模型。

Ann Appl Stat. 2021 Jun;15(2):925-951. doi: 10.1214/20-aoas1434. Epub 2021 Jul 12.

A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling.用于比例数据建模的广义狄利克雷分布的狄利克雷过程混合模型。

IEEE Trans Neural Netw. 2010 Jan;21(1):107-22. doi: 10.1109/TNN.2009.2034851. Epub 2009 Dec 4.

Axially Symmetric Data Clustering Through Dirichlet Process Mixture Models of Watson Distributions.通过沃森分布的狄利克雷过程混合模型进行轴对称数据聚类

IEEE Trans Neural Netw Learn Syst. 2019 Jun;30(6):1683-1694. doi: 10.1109/TNNLS.2018.2872986. Epub 2018 Oct 23.

引用本文的文献

Evaluating discrepancies in dimensionality reduction for time-series single-cell RNA-sequencing data.评估时间序列单细胞RNA测序数据降维中的差异。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf287.

Zero-shot evaluation reveals limitations of single-cell foundation models.零样本评估揭示了单细胞基础模型的局限性。

Genome Biol. 2025 Apr 18;26(1):101. doi: 10.1186/s13059-025-03574-x.

scINRB: single-cell gene expression imputation with network regularization and bulk RNA-seq data.scINRB：基于网络正则化和 bulk RNA-seq 数据的单细胞基因表达推断。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae148.

Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data.用于单细胞RNA测序数据的具有统一标记基因选择的可扩展非参数聚类

bioRxiv. 2024 Feb 12:2024.02.11.579839. doi: 10.1101/2024.02.11.579839.

scCURE identifies cell types responding to immunotherapy and enables outcome prediction.scCURE 可识别对免疫疗法有反应的细胞类型，并能够进行预后预测。

Cell Rep Methods. 2023 Nov 20;3(11):100643. doi: 10.1016/j.crmeth.2023.100643.

A new and effective two-step clustering approach for single cell RNA sequencing data.一种新的、有效的单细胞 RNA 测序数据两步聚类方法。

BMC Genomics. 2023 Nov 9;23(Suppl 6):864. doi: 10.1186/s12864-023-09577-x.

Essential procedures of single-cell RNA sequencing in multiple myeloma and its translational value.多发性骨髓瘤单细胞RNA测序的基本程序及其转化价值。

Blood Sci. 2023 Nov 2;5(4):221-236. doi: 10.1097/BS9.0000000000000172. eCollection 2023 Oct.

scKINETICS: inference of regulatory velocity with single-cell transcriptomics data.scKINETICS：从单细胞转录组学数据推断调控速度。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i394-i403. doi: 10.1093/bioinformatics/btad267.

Bayesian cluster analysis.贝叶斯聚类分析。

Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220149. doi: 10.1098/rsta.2022.0149. Epub 2023 Mar 27.

Graph embedding and Gaussian mixture variational autoencoder network for end-to-end analysis of single-cell RNA sequencing data.基于图嵌入和高斯混合变分自动编码器网络的单细胞 RNA 测序数据端到端分析。

Cell Rep Methods. 2023 Jan 5;3(1):100382. doi: 10.1016/j.crmeth.2022.100382. eCollection 2023 Jan 23.

本文引用的文献

Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis.通过通路和基因集过度分散分析来表征转录异质性。

Nat Methods. 2016 Mar;13(3):241-4. doi: 10.1038/nmeth.3734. Epub 2016 Jan 18.

Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors.髓系祖细胞中的转录异质性和谱系决定。

Cell. 2015 Dec 17;163(7):1663-77. doi: 10.1016/j.cell.2015.11.013. Epub 2015 Nov 25.

BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.BASiCS：单细胞测序数据的贝叶斯分析

PLoS Comput Biol. 2015 Jun 24;11(6):e1004333. doi: 10.1371/journal.pcbi.1004333. eCollection 2015 Jun.

Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis.急性髓系白血病的数据驱动表型剖析揭示了与预后相关的祖细胞样细胞。

Cell. 2015 Jul 2;162(1):184-97. doi: 10.1016/j.cell.2015.05.047. Epub 2015 Jun 18.

Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.利用纳升液滴对单个细胞进行高度并行的全基因组表达谱分析。

Cell. 2015 May 21;161(5):1202-1214. doi: 10.1016/j.cell.2015.05.002.

Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.应用于胚胎干细胞的单细胞转录组学的液滴条形码技术。

Cell. 2015 May 21;161(5):1187-1201. doi: 10.1016/j.cell.2015.04.044.

Spatial reconstruction of single-cell gene expression data.单细胞基因表达数据的空间重建

Nat Biotechnol. 2015 May;33(5):495-502. doi: 10.1038/nbt.3192. Epub 2015 Apr 13.

Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.脑结构。单细胞 RNA 测序揭示的小鼠皮层和海马中的细胞类型。

Science. 2015 Mar 6;347(6226):1138-42. doi: 10.1126/science.aaa1934. Epub 2015 Feb 19.

Computational and analytical challenges in single-cell transcriptomics.单细胞转录组学中的计算和分析挑战。

Nat Rev Genet. 2015 Mar;16(3):133-45. doi: 10.1038/nrg3833. Epub 2015 Jan 28.

Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.单细胞 RNA 测序数据中细胞间异质性的计算分析揭示了细胞的隐藏亚群。

Nat Biotechnol. 2015 Feb;33(2):155-60. doi: 10.1038/nbt.3102. Epub 2015 Jan 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验