Suppr超能文献

无监督评估细胞身份的聚类成员的统计显著性。

Statistical significance of cluster membership for unsupervised evaluation of cell identities.

机构信息

Institute of Informatics, Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw 02-097, Poland.

NHLBI Integrated Cardiovascular Data Science Training Program, University of California, Los Angeles, CA 90095, USA.

出版信息

Bioinformatics. 2020 May 1;36(10):3107-3114. doi: 10.1093/bioinformatics/btaa087.

Abstract

MOTIVATION

Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries.

RESULTS

We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership.

AVAILABILITY AND IMPLEMENTATION

https://cran.r-project.org/package=jackstraw.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞 RNA 测序 (scRNA-seq) 使我们能够剖析细胞类型、时空背景和环境刺激引起的转录异质性。转录异质性可能反映了通常无法测量或事先未知的表型和分子特征。然后通过对 scRNA-seq 数据进行聚类来确定来自异质亚群样本的细胞身份。这些细胞身份用于下游分析。我们如何检查细胞身份是否被准确推断?与单细胞的外部测量或标签不同,基于聚类的细胞身份会产生虚假信号和错误发现。

结果

我们介绍了非参数方法,通过以无监督的方式测试聚类成员来评估细胞身份。各种模拟研究表明,jackstraw 测试对聚类成员的准确性。我们提出了一个细胞应该包含在基于聚类的亚群中的后验概率。聚类成员的后验包含概率 (PIP) 可用于选择和可视化与亚群相关的样本。所提出的方法应用于三个 scRNA-seq 数据集。首先,Jurkat 和 293T 细胞系的混合物提供了两个截然不同的细胞群体。其次,Cell Hashing 产生了对应于八个供体的细胞身份,这些供体独立地由 jackstraw 进行分析。第三,外周血单核细胞用于探索异质免疫群体。所提出的 P 值和 PIP 导致了单细胞的概率特征选择,可以使用主成分分析 (PCA)、t 分布随机邻域嵌入 (t-SNE) 等进行可视化。通过学习高维数据聚类中的不确定性,所提出的方法能够对聚类成员进行无监督评估。

可用性和实现

https://cran.r-project.org/package=jackstraw。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9929/7214036/a114d0647a3a/btaa087f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验