Kubovčiak Jan, Kolář Michal, Novotný Jiří
Laboratory of Genomics and Bioinformatics, Institute of Molecular Genetics of the Czech Academy of Sciences, Vídeňská 1083, 142 20 Prague 4, Czech Republic.
Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology in Prague, Technická 5, 166 28 Prague 6, Czech Republic.
Bioinform Adv. 2023 Jul 6;3(1):vbad089. doi: 10.1093/bioadv/vbad089. eCollection 2023.
While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the current best programming practices and requirements for reproducibility.
We have developed scdrake, a fully automated workflow for secondary analysis of scRNA-seq data, which is fully implemented in the R language and built within the drake framework. The pipeline includes quality control, cell and gene filtering, normalization, detection of highly variable genes, dimensionality reduction, clustering, cell type annotation, detection of marker genes, differential expression analysis and integration of multiple samples. The pipeline is reproducible and scalable, has an efficient execution, provides easy extendability and access to intermediate results and outputs rich HTML reports. Scdrake is distributed as a Docker image, which provides a straightforward setup and enhances reproducibility.
The source code and documentation are available under the MIT license at https://github.com/bioinfocz/scdrake and https://bioinfocz.github.io/scdrake, respectively.
Supplementary data are available at online.
虽然单细胞RNA测序(scRNA-seq)数据的初级分析工作流程已经确立,但特征条形码矩阵的二级分析通常由自定义脚本完成。在R统计环境中没有完全自动化的流程,该流程应遵循当前最佳编程实践和可重复性要求。
我们开发了scdrake,这是一个用于scRNA-seq数据二级分析的完全自动化工作流程,它完全用R语言实现,并构建在drake框架内。该流程包括质量控制、细胞和基因过滤、归一化、高可变基因检测、降维、聚类、细胞类型注释、标记基因检测、差异表达分析以及多个样本的整合。该流程具有可重复性和可扩展性,执行效率高,易于扩展并能访问中间结果,还能输出丰富的HTML报告。Scdrake以Docker镜像的形式发布,提供了简单的设置并增强了可重复性。
源代码和文档分别在https://github.com/bioinfocz/scdrake和https://bioinfocz.github.io/scdrake上根据MIT许可提供。
补充数据可在网上获取。