Department of Pathology, Stanford University, Stanford, California, United States of America.
Department of Statistics, Stanford University, Stanford, California, United States of America.
PLoS Comput Biol. 2021 Jun 28;17(6):e1009119. doi: 10.1371/journal.pcbi.1009119. eCollection 2021 Jun.
Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or "mutational signatures". Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates a user-specified background signature, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using a variety of standard metrics. We then apply SparseSignatures to whole genome sequences of pancreatic and breast tumors, discovering well-differentiated signatures that are linked to known mutagenic mechanisms and are strongly associated with patient clinical features.
癌症是诱变过程的结果,可以通过分析点突变的速率谱,即“突变特征”,从肿瘤基因组中推断出来。在这里,我们提出了 SparseSignatures,这是一种从体细胞点突变数据中提取特征的新框架。我们的方法包含了用户指定的背景特征,采用正则化减少非背景特征中的噪声,使用交叉验证来确定特征的数量,并且可以扩展到大型数据集。我们使用各种标准指标表明,SparseSignatures 在模拟数据上的表现优于当前最先进的方法。然后,我们将 SparseSignatures 应用于胰腺和乳腺肿瘤的全基因组序列,发现了与已知诱变机制相关的分化良好的特征,并且与患者的临床特征密切相关。