European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
Bridging Research Division on Mechanisms of Genomic Variation and Data Science, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad633.
Single-cell DNA template strand sequencing (Strand-seq) allows a range of various genomic analysis including chromosome length haplotype phasing and structural variation (SV) calling in individual cells. Here, we present MosaiCatcher v2, a standardized workflow and reference framework for single-cell SV detection using Strand-seq. This framework introduces a range of functionalities, including: an automated upstream Quality Control (QC) and assembly sub-workflow that relies on multiple genome assemblies and incorporates a multistep normalization module, integration of the single-cell nucleosome occupancy and genetic variation analysis SV functional characterization and of the ArbiGent SV genotyping modules, platform portability, as well as a user-friendly and shareable web report. These new features of MosaiCatcher v2 enable reproducible computational processing of Strand-seq data, which are increasingly used in human genetics and single-cell genomics, toward production environments. MosaiCatcher v2 is compatible with both container and conda environments, ensuring reproducibility and robustness and positioning the framework as a cornerstone in computational processing of Strand-seq data.
MosaiCatcher v2 is a standardized workflow, implemented using the Snakemake workflow management system. The pipeline is available on GitHub: https://github.com/friendsofstrandseq/mosaicatcher-pipeline/ and on the snakemake-workflow-catalog: https://snakemake.github.io/snakemake-workflow-catalog/?usage=friendsofstrandseq/mosaicatcher-pipeline. Strand-seq example input data used in the publication can be found in the Data availability statement. Additionally, a lightweight dataset for test purposes can be found on the GitHub repository.
单细胞 DNA 模板链测序(Strand-seq)允许对各种基因组进行分析,包括个体细胞中的染色体长度单倍型定相和结构变异(SV)检测。在此,我们介绍了 MosaiCatcher v2,这是一种使用 Strand-seq 进行单细胞 SV 检测的标准化工作流程和参考框架。该框架引入了一系列功能,包括:一个自动化的上游质量控制(QC)和组装子工作流程,该工作流程依赖于多个基因组组装,并包含一个多步标准化模块、单细胞核小体占有率和遗传变异分析的整合、SV 功能特征分析和 ArbiGent SV 基因分型模块、平台可移植性以及用户友好且可共享的网络报告。MosaiCatcher v2 的这些新功能使 Strand-seq 数据的可重复计算处理成为可能,这些数据越来越多地用于人类遗传学和单细胞基因组学,朝着生产环境发展。MosaiCatcher v2 与容器和 conda 环境兼容,确保了可重复性和稳健性,并将该框架定位为 Strand-seq 数据计算处理的基石。
MosaiCatcher v2 是一个标准化的工作流程,使用 Snakemake 工作流程管理系统实现。该管道可在 GitHub 上获得:https://github.com/friendsofstrandseq/mosaicatcher-pipeline/ 和在 snakemake-workflow-catalog 上获得:https://snakemake.github.io/snakemake-workflow-catalog/?usage=friendsofstrandseq/mosaicatcher-pipeline。本文中使用的发表文章中的 Strand-seq 示例输入数据可在数据可用性声明中找到。此外,还可以在 GitHub 存储库中找到用于测试目的的轻量级数据集。