Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy.
Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark.
J Mol Biol. 2022 Jun 15;434(11):167560. doi: 10.1016/j.jmb.2022.167560. Epub 2022 Mar 24.
The advent of single-cell sequencing is providing unprecedented opportunities to disentangle tissue complexity and investigate cell identities and functions. However, the analysis of single cell data is a challenging, multi-step process that requires both advanced computational skills and biological sensibility. When dealing with single cell RNA-seq (scRNA-seq) data, the presence of technical artifacts, noise, and biological biases imposes to first identify, and eventually remove, unreliable signals from low-quality cells and unwanted sources of variation that might affect the efficacy of subsequent downstream modules. Pre-processing and quality control (QC) of scRNA-seq data is a laborious process consisting in the manual combination of different computational strategies to quantify QC-metrics and define optimal sets of pre-processing parameters. Here we present popsicleR, a R package to interactively guide skilled and unskilled command line-users in the pre-processing and QC analysis of scRNA-seq data. The package integrates, into several main wrapper functions, methods derived from widely used pipelines for the estimation of quality-control metrics, filtering of low-quality cells, data normalization, removal of technical and biological biases, and for cell clustering and annotation. popsicleR starts from either the output files of the Cell Ranger pipeline from 10X Genomics or from a feature-barcode matrix of raw counts generated from any scRNA-seq technology. Open-source code, installation instructions, and a case study tutorial are freely available at https://github.com/bicciatolab/popsicleR.
单细胞测序的出现为我们提供了前所未有的机会,可以解析组织的复杂性,并研究细胞的身份和功能。然而,单细胞数据分析是一个具有挑战性的、多步骤的过程,需要先进的计算技能和生物学敏感性。在处理单细胞 RNA 测序 (scRNA-seq) 数据时,技术伪影、噪声和生物偏差的存在要求我们首先识别并最终去除低质量细胞和可能影响下游模块效果的其他来源的不可靠信号。scRNA-seq 数据的预处理和质量控制 (QC) 是一个繁琐的过程,包括手动组合不同的计算策略,以量化 QC 指标并定义预处理参数的最佳集合。这里我们介绍 popsicleR,这是一个 R 包,可以指导有经验和无经验的命令行用户进行 scRNA-seq 数据的预处理和 QC 分析。该包将来自广泛使用的用于估计质量控制指标、过滤低质量细胞、数据标准化、去除技术和生物学偏差以及细胞聚类和注释的管道的方法集成到几个主要的封装函数中。popsicleR 可以从 10X Genomics 的 Cell Ranger 管道的输出文件或从任何 scRNA-seq 技术生成的原始计数的特征条形码矩阵开始。开源代码、安装说明和案例研究教程可在 https://github.com/bicciatolab/popsicleR 上免费获得。