Baßler Kevin, Günther Patrick, Schulte-Schrepping Jonas, Becker Matthias, Biernat Paweł
Department for Genomics and Immunoregulation, Life and Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany.
Platform for Single Cell Genomics and Epigenomics, German Center for Neurodegenerative Diseases (DZNE), University of Bonn, Bonn, Germany.
Methods Mol Biol. 2019;1979:433-455. doi: 10.1007/978-1-4939-9240-9_26.
The recent technological developments in the field of single-cell RNA-Seq enable us to assay the transcriptome of up to a million single cells in parallel. However, the analyses of such big datasets present a major challenge. During the last decade, a wide variety of strategies have been proposed covering different steps of the analysis. Here, we introduce a selection of computational tools to provide an overview of a generic analysis pipeline.The first step of every scRNA-Seq experiment is proper study design, which does not require sophisticated experimental or informatics skills but is nonetheless presumably the most important step. The quality of the resulting data strictly depends on the proper planning of the experiment, including the selection of the most suitable technology for the biological question of interest as well as an elaborated study design to minimize the influence of confounding factors. Once the experiment has been conducted, the raw sequencing data needs to be processed to extract the gene expression information for each cell. This task comprises quality assessment of the sequenced reads, alignment against a reference genome, demultiplexing of the cell barcodes, and quantification of the reads/transcripts per gene. As any other transcriptomics technology, single-cell mRNA-Seq requires data normalization to assure sample-to-sample, here cell-to-cell, comparability and the consideration of confounding factors.Once gene expression values have been extracted from the reads and normalized, the researcher has the agony of choosing between a plethora of analysis approaches to investigate diverse aspects of the single-cell transcriptomes, such as dimensionality reduction and clustering to explore cellular heterogeneity or trajectory analysis to model differentiation processes.In this chapter, we present a wrap-up of the abovementioned steps to conduct single-cell RNA-Seq analyses and present a selection of existing tools.
单细胞RNA测序领域最近的技术发展使我们能够并行分析多达一百万个单细胞的转录组。然而,分析如此庞大的数据集是一项重大挑战。在过去十年中,人们提出了各种各样的策略,涵盖了分析的不同步骤。在这里,我们介绍一系列计算工具,以概述通用的分析流程。
每个单细胞RNA测序实验的第一步是进行适当的研究设计,这不需要复杂的实验或信息学技能,但可能是最重要的一步。所得数据的质量严格取决于实验的合理规划,包括为感兴趣的生物学问题选择最合适的技术,以及精心设计的研究方案,以尽量减少混杂因素的影响。
实验完成后,需要对原始测序数据进行处理,以提取每个细胞的基因表达信息。这项任务包括对测序读数进行质量评估、与参考基因组进行比对、对细胞条形码进行解复用,以及对每个基因的读数/转录本进行定量。与任何其他转录组学技术一样,单细胞mRNA测序需要进行数据归一化,以确保样本间(在这里是细胞间)的可比性,并考虑混杂因素。
一旦从读数中提取了基因表达值并进行了归一化,研究人员就会面临痛苦的选择,在众多分析方法中进行抉择,以研究单细胞转录组的不同方面,例如进行降维和聚类以探索细胞异质性,或进行轨迹分析以模拟分化过程。
在本章中,我们总结了上述进行单细胞RNA测序分析的步骤,并介绍了一些现有的工具。