Department of Molecular Medicine and Surgery, Karolinska Institutet, SE-171 76, Stockholm, Sweden.
Department of Clinical Genetics, Karolinska University Hospital, SE-171 76, Stockholm, Sweden.
BMC Bioinformatics. 2020 Apr 3;21(1):128. doi: 10.1186/s12859-020-3451-8.
DNA damage accumulates over the course of cancer development. The often-substantial amount of somatic mutations in cancer poses a challenge to traditional methods to characterize tumors based on driver mutations. However, advances in machine learning technology can take advantage of this substantial amount of data.
We developed a command line interface python package, pyCancerSig, to perform sample profiling by integrating single nucleotide variation (SNV), structural variation (SV) and microsatellite instability (MSI) profiles into a unified profile. It also provides a command to decipher underlying cancer processes, employing an unsupervised learning technique, Non-negative Matrix Factorization, and a command to visualize the results. The package accepts common standard file formats (vcf, bam). The program was evaluated using a cohort of breast- and colorectal cancer from The Cancer Genome Atlas project (TCGA). The result showed that by integrating multiple mutations modes, the tool can correctly identify cases with known clear mutational signatures and can strengthen signatures in cases with unclear signal from an SNV-only profile. The software package is available at https://github.com/jessada/pyCancerSig.
pyCancerSig has demonstrated its capability in identifying known and unknown cancer processes, and at the same time, illuminates the association within and between the mutation modes.
在癌症发展过程中,DNA 损伤会不断累积。癌症中经常存在大量的体细胞突变,这给基于驱动突变来描述肿瘤的传统方法带来了挑战。然而,机器学习技术的进步可以利用这些大量的数据。
我们开发了一个命令行接口的 Python 包 pyCancerSig,通过将单核苷酸变异 (SNV)、结构变异 (SV) 和微卫星不稳定性 (MSI) 谱图整合到一个统一的谱图中,来进行样本分析。它还提供了一个命令,通过使用无监督学习技术——非负矩阵分解来解码潜在的癌症过程,并提供了一个命令来可视化结果。该软件包接受常见的标准文件格式(vcf、bam)。我们使用来自癌症基因组图谱计划 (TCGA) 的乳腺癌和结直肠癌队列对该程序进行了评估。结果表明,通过整合多种突变模式,该工具可以正确识别已知明确突变特征的病例,并可以增强只有 SNV 谱图时信号不明确的病例的特征。该软件包可在 https://github.com/jessada/pyCancerSig 上获得。
pyCancerSig 已经证明了其在识别已知和未知癌症过程方面的能力,同时也揭示了突变模式之间和内部的关联。