Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy.
Department of Medical, Surgical and Experimental Sciences, University of Sassari, 07100 Sassari, Italy.
Int J Mol Sci. 2024 Jul 24;25(15):8044. doi: 10.3390/ijms25158044.
Accurate detection and analysis of somatic variants in cancer involve multiple third-party tools with complex dependencies and configurations, leading to laborious, error-prone, and time-consuming data conversions. This approach lacks accuracy, reproducibility, and portability, limiting clinical application. was developed to address these issues as an end-to-end pipeline for detecting, classifying, and interpreting cancer mutations. is based on a Python command-line tool designed to manage tumor-normal samples for precise somatic mutation analysis. The core is a Snakemake-based workflow that covers all key cancer genomics steps, including variant calling, mutational signature deconvolution, variant annotation, driver gene detection, pathway analysis, and tumor heterogeneity estimation. is easy to install on any system via Docker, with a Makefile handling installation, configuration, and execution, allowing for full or partial pipeline runs. has been validated at the CRS4-NGS Core facility and tested on large datasets from The Cancer Genome Atlas and the Beijing Institute of Genomics. has proven robust and flexible for somatic variant analysis in cancer. It is user-friendly, requiring no specialized programming skills, and enables data processing with a single command line. Its reproducibility ensures consistent results across users following the same protocol.
准确检测和分析癌症中的体细胞变异需要多个具有复杂依赖关系和配置的第三方工具,这导致数据转换繁琐、容易出错且耗时。这种方法缺乏准确性、可重复性和可移植性,限制了其在临床中的应用。 是为了解决这些问题而开发的,作为一个端到端的管道,用于检测、分类和解释癌症突变。 是基于一个 Python 命令行工具设计的,用于管理肿瘤-正常样本进行精确的体细胞突变分析。核心是一个基于 Snakemake 的工作流程,涵盖了所有关键的癌症基因组学步骤,包括变异调用、突变特征分解、变异注释、驱动基因检测、通路分析和肿瘤异质性估计。 可以通过 Docker 轻松安装在任何系统上,使用 Makefile 处理安装、配置和执行,允许进行完整或部分管道运行。 在 CRS4-NGS 核心设施进行了验证,并在来自癌症基因组图谱和北京基因组研究所的大型数据集上进行了测试。 在癌症中的体细胞变异分析中已被证明具有稳健性和灵活性。它易于使用,不需要专门的编程技能,并允许使用单个命令行进行数据处理。其可重复性确保了遵循相同协议的用户的结果一致。