Shekhar Karthik, Menon Vilas
Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA.
Methods Mol Biol. 2019;1935:45-77. doi: 10.1007/978-1-4939-9057-3_4.
Unprecedented technological advances in single-cell RNA-sequencing (scRNA-seq) technology have now made it possible to profile genome-wide expression in single cells at low cost and high throughput. There is substantial ongoing effort to use scRNA-seq measurements to identify the "cell types" that form components of a complex tissue, akin to taxonomizing species in ecology. Cell type classification from scRNA-seq data involves the application of computational tools rooted in dimensionality reduction and clustering, and statistical analysis to identify molecular signatures that are unique to each type. As datasets continue to grow in size and complexity, computational challenges abound, requiring analytical methods to be scalable, flexible, and robust. Moreover, careful consideration needs to be paid to experimental biases and statistical challenges that are unique to these measurements to avoid artifacts. This chapter introduces these topics in the context of cell-type identification, and outlines an instructive step-by-step example bioinformatic pipeline for researchers entering this field.
单细胞RNA测序(scRNA-seq)技术取得了前所未有的技术进步,现在已经能够以低成本、高通量的方式对单细胞中的全基因组表达进行分析。目前正在进行大量工作,利用scRNA-seq测量来识别构成复杂组织成分的“细胞类型”,这类似于生态学中对物种进行分类。从scRNA-seq数据进行细胞类型分类涉及应用基于降维和聚类的计算工具,以及进行统计分析以识别每种类型独特的分子特征。随着数据集的规模和复杂性不断增加,计算挑战层出不穷,这就要求分析方法具有可扩展性、灵活性和稳健性。此外,需要仔细考虑这些测量所特有的实验偏差和统计挑战,以避免出现伪影。本章将在细胞类型识别的背景下介绍这些主题,并为进入该领域的研究人员概述一个具有指导意义的逐步示例生物信息学流程。