Suppr超能文献

用于校正单细胞基因表达数据中技术变异的狄利克雷过程混合模型

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data.

作者信息

Prabhakaran Sandhya, Azizi Elham, Carr Ambrose, Pe'er Dana

机构信息

Departments of Biological Sciences, Systems Biology and Computer Science, Columbia University, New York, NY, USA.

出版信息

JMLR Workshop Conf Proc. 2016;48:1070-1079.

Abstract

We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

摘要

我们介绍了一种用于单细胞基因表达数据的迭代归一化和聚类方法。新兴的单细胞RNA测序技术能够获取数千个细胞的基因表达测量值,从而有助于发现和表征细胞类型。然而,数据受到实验误差和细胞类型特异性偏差所产生的技术变异的影响。当前方法在分析生物信号之前进行全局归一化,这无法解决缺失数据或依赖潜在细胞类型的变异问题。我们的模型被构建为一个具有细胞特异性缩放的分层贝叶斯混合模型,有助于细胞的迭代归一化和聚类,将技术变异与生物信号区分开来。我们证明这种方法优于先进行聚类再进行全局归一化的方法。我们展示了我们方法的可识别性和弱收敛保证,并提出了一种可扩展的吉布斯推理算法。与先前方法相比,该方法在合成和真实单细胞数据中都改进了聚类推理,并且能够轻松解释和恢复潜在结构及细胞类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3c5/6004614/2e30674eabd2/nihms972080f1.jpg

相似文献

5
Hierarchical Dirichlet process model for gene expression clustering.用于基因表达聚类的分层狄利克雷过程模型
EURASIP J Bioinform Syst Biol. 2013 Apr 12;2013(1):5. doi: 10.1186/1687-4153-2013-5.

引用本文的文献

9
Bayesian cluster analysis.贝叶斯聚类分析。
Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220149. doi: 10.1098/rsta.2022.0149. Epub 2023 Mar 27.

本文引用的文献

3
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.BASiCS:单细胞测序数据的贝叶斯分析
PLoS Comput Biol. 2015 Jun 24;11(6):e1004333. doi: 10.1371/journal.pcbi.1004333. eCollection 2015 Jun.
7
Spatial reconstruction of single-cell gene expression data.单细胞基因表达数据的空间重建
Nat Biotechnol. 2015 May;33(5):495-502. doi: 10.1038/nbt.3192. Epub 2015 Apr 13.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验