Genomic Epidemiology Branch, International Agency for Research on Cancer (IARC/WHO), Lyon, France.
BMC Bioinformatics. 2021 Nov 4;22(1):540. doi: 10.1186/s12859-021-04450-8.
Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation methods, however, accurate attribution of these signatures to individual samples is a distinct problem requiring uncertainty estimation, particularly in noisy scenarios or when the acting signatures have similar shapes. Whilst many packages for signature attribution exist, a few provide accuracy measures, and most are not easily reproducible nor scalable in high-performance computing environments.
We present Mutational Signature Attribution (MSA), a reproducible pipeline designed to assign signatures of different mutation types on a single-sample basis, using Non-Negative Least Squares method with optimisation based on configurable simulations. Parametric bootstrap is proposed as a way to measure statistical uncertainties of signature attribution. Supported mutation types include single and doublet base substitutions, indels and structural variants. Results are validated using simulations with reference COSMIC signatures, as well as randomly generated signatures.
MSA is a tool for optimised mutational signature attribution based on simulations, providing confidence intervals using parametric bootstrap. It comprises a set of Python scripts unified in a single Nextflow pipeline with containerisation for cross-platform reproducibility and scalability in high-performance computing environments. The tool is publicly available from https://gitlab.com/s.senkin/MSA .
突变特征被证明是识别基因组中突变模式的有用工具,通常可以提供有关诱变过程或正常 DNA 损伤的有价值的见解。使用非负矩阵分解方法通常可以进行特征的从头提取,然而,将这些特征准确归因于单个样本是一个需要不确定性估计的独特问题,特别是在嘈杂的情况下或作用特征形状相似时。虽然有许多用于特征归因的软件包,但只有少数几个提供了准确性度量,而且大多数在高性能计算环境中不容易重现或扩展。
我们提出了 Mutational Signature Attribution(MSA),这是一个可重复的管道,旨在使用基于可配置模拟的非负最小二乘法对单个样本上的不同突变类型的特征进行分配。提出了参数自举作为衡量特征归因统计不确定性的一种方法。支持的突变类型包括单碱基和双碱基替换、插入缺失和结构变异。使用参考 COSMIC 特征以及随机生成的特征的模拟对结果进行验证。
MSA 是一种基于模拟的优化突变特征归因工具,使用参数自举提供置信区间。它由一组 Python 脚本组成,统一在一个具有容器化的 Nextflow 管道中,以实现跨平台的可重复性和在高性能计算环境中的可扩展性。该工具可从 https://gitlab.com/s.senkin/MSA 获得。