Suppr超能文献

用于单细胞RNA测序数据的具有统一标记基因选择的可扩展非参数聚类

Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data.

作者信息

Nwizu Chibuikem, Hughes Madeline, Ramseier Michelle L, Navia Andrew W, Shalek Alex K, Fusi Nicolo, Raghavan Srivatsan, Winter Peter S, Amini Ava P, Crawford Lorin

机构信息

Center for Computational Molecular Biology, Brown University, Providence, RI, USA.

Warren Alpert Medical School of Brown University, Providence, RI, USA.

出版信息

bioRxiv. 2024 Feb 12:2024.02.11.579839. doi: 10.1101/2024.02.11.579839.

Abstract

Clustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and complexity to bioinformatic workflows; second, they rely on post-selective differential expression analyses to identify marker genes driving cluster differences, which has been shown to be subject to inflated false discovery rates. We address these challenges by introducing nonparametric clustering of single-cell populations (NCLUSION): an infinite mixture model that leverages Bayesian sparse priors to identify marker genes while simultaneously performing clustering on single-cell expression data. NCLUSION uses a scalable variational inference algorithm to perform these analyses on datasets with up to millions of cells. By analyzing publicly available scRNA-seq studies, we demonstrate that NCLUSION (i) matches the performance of other state-of-the-art clustering techniques with significantly reduced runtime and (ii) provides statistically robust and biologically relevant transcriptomic signatures for each of the clusters it identifies. Overall, NCLUSION represents a reliable hypothesis-generating tool for understanding patterns of expression variation present in single-cell populations.

摘要

聚类常用于单细胞RNA测序(scRNA-seq)流程中,以表征细胞异质性。然而,当前方法面临两个主要限制。首先,它们需要用户指定启发式方法,这增加了生物信息学工作流程的时间和复杂性;其次,它们依赖于选择性差异表达分析来识别驱动簇差异的标记基因,而这已被证明会导致过高的错误发现率。我们通过引入单细胞群体的非参数聚类(NCLUSION)来应对这些挑战:这是一种无限混合模型,利用贝叶斯稀疏先验来识别标记基因,同时对单细胞表达数据进行聚类。NCLUSION使用可扩展的变分推理算法对包含数百万个细胞的数据集进行这些分析。通过分析公开可用的scRNA-seq研究,我们证明NCLUSION(i)在显著减少运行时间的情况下与其他现有聚类技术的性能相匹配,并且(ii)为其识别的每个簇提供统计上稳健且生物学相关的转录组特征。总体而言,NCLUSION是一个可靠的假设生成工具,用于理解单细胞群体中存在的表达变异模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d307/10888887/4868eb6cd5d9/nihpp-2024.02.11.579839v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验