Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, Norway.
Oslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, 0372 Oslo, Norway.
Int J Mol Sci. 2021 Jan 30;22(3):1399. doi: 10.3390/ijms22031399.
The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.
单细胞 RNA 测序 (scRNA-seq) 的益处日益受到关注,这导致了大量用于分析 scRNA-seq 数据不同方面的计算软件包。对于没有高级编程技能的研究人员来说,将几个软件包组合在一起以简单、可重复的方式执行所需的分析是非常具有挑战性的。在这里,我们介绍了 DIscBIO,这是一个开源的、多算法的管道,用于在转录组水平上轻松、高效且可重复地分析细胞亚群。该管道集成了多个 scRNA-seq 软件包,并允许使用决策树和基因富集分析在网络上下文中发现生物标志物,使用单细胞测序读数以聚类和差异分析为基础。DIscBIO 可作为 R 包免费获得。它可以在命令行模式下运行,也可以通过使用 Jupyter 笔记本的用户友好的计算管道运行。我们使用两个 scRNA-seq 数据集展示了所有管道功能。第一个数据集包含乳腺癌患者的循环肿瘤细胞。第二个数据集是粘液样脂肪肉瘤的细胞周期调控数据集。所有分析都作为笔记本提供,这些笔记本以带有解释性文本和输出数据和图像的顺序叙述 R 代码进行整合。R 用户可以使用笔记本了解管道的不同步骤,并指导他们探索自己的 scRNA-seq 数据。我们还提供了一个使用 Binder 的云版本,允许在无需下载 R、Jupyter 或管道使用的任何软件包的情况下执行管道。云版本可作为培训目的的教程,特别是对于那些不是 R 用户或编程技能有限的人。然而,为了进行有意义的 scRNA-seq 分析,所有用户都需要了解所实现的方法及其可能的选项和限制。