Suppr超能文献

RCA2:一种可扩展的监督聚类算法,可减少 scRNA-seq 数据中的批次效应。

RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data.

机构信息

Laboratory of Systems Biology and Data Analytics, Genome Institute of Singapore, A*STAR, 60 Biopolis St, 138672, Singapore.

DUKE-NUS Medical School, 8 College Rd, 169857, Singapore.

出版信息

Nucleic Acids Res. 2021 Sep 7;49(15):8505-8519. doi: 10.1093/nar/gkab632.

Abstract

The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than biological, variation. Compared to de-novo (unsupervised) clustering, we demonstrate using multiple benchmarks that supervised clustering, which uses reference transcriptomes as a guide, is robust to batch effects and data quality artifacts. Here, we present RCA2, the first algorithm to combine reference projection (batch effect robustness) with graph-based clustering (scalability). In addition, RCA2 provides a user-friendly framework incorporating multiple commonly used downstream analysis modules. RCA2 also provides new reference panels for human and mouse and supports generation of custom panels. Furthermore, RCA2 facilitates cell type-specific QC, which is essential for accurate clustering of data from heterogeneous tissues. We demonstrate the advantages of RCA2 on SC data from human bone marrow, healthy PBMCs and PBMCs from COVID-19 patients. Scalable supervised clustering methods such as RCA2 will facilitate unified analysis of cohort-scale SC datasets.

摘要

使用单细胞 (SC) 技术,可以以前所未有的细节分析人体细胞类型的转录组多样性。无监督聚类 SC 转录组,这是定义细胞类型的默认技术,容易根据技术而不是生物学变化将细胞分组。与从头开始(无监督)聚类相比,我们使用多个基准证明,使用参考转录组作为指导的有监督聚类对批次效应和数据质量伪影具有鲁棒性。在这里,我们提出了 RCA2,这是第一个将参考投影(批处理效应稳健性)与基于图的聚类(可扩展性)相结合的算法。此外,RCA2 提供了一个用户友好的框架,其中包含多个常用的下游分析模块。RCA2 还提供了人类和小鼠的新参考面板,并支持生成自定义面板。此外,RCA2 促进细胞类型特异性 QC,这对于从异质组织中准确聚类数据至关重要。我们在来自人类骨髓、健康 PBMC 和 COVID-19 患者 PBMC 的 SC 数据上展示了 RCA2 的优势。像 RCA2 这样的可扩展监督聚类方法将促进队列规模 SC 数据集的统一分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4748/8421225/138e3e970b37/gkab632fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验