可扩展的单细胞基因调控网络分析 SCENIC 工作流程。
A scalable SCENIC workflow for single-cell gene regulatory network analysis.
机构信息
VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.
Department of Human Genetics, KU Leuven, Leuven, Belgium.
出版信息
Nat Protoc. 2020 Jul;15(7):2247-2276. doi: 10.1038/s41596-020-0336-2. Epub 2020 Jun 19.
This protocol explains how to perform a fast SCENIC analysis alongside standard best practices steps on single-cell RNA-sequencing data using software containers and Nextflow pipelines. SCENIC reconstructs regulons (i.e., transcription factors and their target genes) assesses the activity of these discovered regulons in individual cells and uses these cellular activity patterns to find meaningful clusters of cells. Here we present an improved version of SCENIC with several advances. SCENIC has been refactored and reimplemented in Python (pySCENIC), resulting in a tenfold increase in speed, and has been packaged into containers for ease of use. It is now also possible to use epigenomic track databases, as well as motifs, to refine regulons. In this protocol, we explain the different steps of SCENIC: the workflow starts from the count matrix depicting the gene abundances for all cells and consists of three stages. First, coexpression modules are inferred using a regression per-target approach (GRNBoost2). Next, the indirect targets are pruned from these modules using cis-regulatory motif discovery (cisTarget). Lastly, the activity of these regulons is quantified via an enrichment score for the regulon's target genes (AUCell). Nonlinear projection methods can be used to display visual groupings of cells based on the cellular activity patterns of these regulons. The results can be exported as a loom file and visualized in the SCope web application. This protocol is illustrated on two use cases: a peripheral blood mononuclear cell data set and a panel of single-cell RNA-sequencing cancer experiments. For a data set of 10,000 genes and 50,000 cells, the pipeline runs in <2 h.
本方案介绍了如何在使用软件容器和 Nextflow 管道的单细胞 RNA 测序数据上,结合标准最佳实践步骤,快速进行 SCENIC 分析。SCENIC 重建调控网络(即转录因子及其靶基因),评估这些发现的调控网络在单个细胞中的活性,并使用这些细胞活性模式来找到有意义的细胞簇。在这里,我们提出了一个经过改进的 SCENIC 版本,具有多项优势。SCENIC 已在 Python(pySCENIC)中进行重构和重新实现,速度提高了十倍,并已打包到容器中,以便于使用。现在还可以使用表观基因组学跟踪数据库以及基序来优化调控网络。在本方案中,我们将解释 SCENIC 的不同步骤:工作流程从描绘所有细胞基因丰度的计数矩阵开始,由三个阶段组成。首先,使用针对每个目标的回归方法(GRNBoost2)推断共表达模块。接下来,使用顺式调控基序发现(cisTarget)从这些模块中修剪间接靶标。最后,通过富集分数来量化这些调控网络的活性,即调控网络靶基因的 AUCell。非线性投影方法可用于根据这些调控网络的细胞活性模式显示细胞的可视化分组。结果可以导出为 loom 文件,并在 SCope 网络应用程序中进行可视化。本方案通过两个用例进行说明:一个外周血单核细胞数据集和一组单细胞 RNA 测序癌症实验。对于 10000 个基因和 50000 个细胞的数据集,该管道的运行时间不到 2 小时。