MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK.
Institute for Molecular Health Sciences, ETH Zürich, Zürich, 8093, Switzerland.
F1000Res. 2024 May 7;11:59. doi: 10.12688/f1000research.74416.1. eCollection 2022.
Cell-to-cell gene expression variability is an inherent feature of complex biological systems, such as immunity and development. Single-cell RNA sequencing is a powerful tool to quantify this heterogeneity, but it is prone to strong technical noise. In this article, we describe a step-by-step computational workflow that uses the BASiCS Bioconductor package to robustly quantify expression variability within and between known groups of cells (such as experimental conditions or cell types). BASiCS uses an integrated framework for data normalisation, technical noise quantification and downstream analyses, propagating statistical uncertainty across these steps. Within a single seemingly homogeneous cell population, BASiCS can identify highly variable genes that exhibit strong heterogeneity as well as lowly variable genes with stable expression. BASiCS also uses a probabilistic decision rule to identify changes in expression variability between cell populations, whilst avoiding confounding effects related to differences in technical noise or in overall abundance. Using a publicly available dataset, we guide users through a complete pipeline that includes preliminary steps for quality control, as well as data exploration using the scater and scran Bioconductor packages. The workflow is accompanied by a Docker image that ensures the reproducibility of our results.
细胞间基因表达变异性是复杂生物系统(如免疫和发育)的固有特征。单细胞 RNA 测序是量化这种异质性的强大工具,但它容易受到强烈的技术噪声的影响。在本文中,我们描述了一个逐步的计算工作流程,该流程使用 BASiCS Bioconductor 包来稳健地量化已知细胞群体(如实验条件或细胞类型)内和群体之间的表达变异性。BASiCS 使用集成框架进行数据归一化、技术噪声量化和下游分析,在这些步骤中传播统计不确定性。在看似同质的单个细胞群体中,BASiCS 可以识别高度可变的基因,这些基因表现出强烈的异质性,以及表达稳定的低度可变基因。BASiCS 还使用概率决策规则来识别细胞群体之间表达变异性的变化,同时避免与技术噪声或整体丰度差异相关的混杂效应。使用一个公开可用的数据集,我们引导用户完成一个完整的管道,包括质量控制的初步步骤,以及使用 scater 和 scran Bioconductor 包进行数据探索。该工作流程附有一个 Docker 镜像,以确保我们结果的可重复性。