Islam S M Ashiqul, Díaz-Gay Marcos, Wu Yang, Barnes Mark, Vangara Raviteja, Bergstrom Erik N, He Yudou, Vella Mike, Wang Jingwei, Teague Jon W, Clapham Peter, Moody Sarah, Senkin Sergey, Li Yun Rose, Riva Laura, Zhang Tongwu, Gruber Andreas J, Steele Christopher D, Otlu Burçak, Khandekar Azhar, Abbasi Ammal, Humphreys Laura, Syulyukina Natalia, Brady Samuel W, Alexandrov Boian S, Pillay Nischalan, Zhang Jinghui, Adams David J, Martincorena Iñigo, Wedge David C, Landi Maria Teresa, Brennan Paul, Stratton Michael R, Rozen Steven G, Alexandrov Ludmil B
Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA 92093, USA.
Department of Bioengineering, UC San Diego, La Jolla, CA 92093, USA.
Cell Genom. 2022 Nov 9;2(11):None. doi: 10.1016/j.xgen.2022.100179.
Mutational signature analysis is commonly performed in cancer genomic studies. Here, we present SigProfilerExtractor, an automated tool for extraction of mutational signatures, and benchmark it against another 13 bioinformatics tools by using 34 scenarios encompassing 2,500 simulated signatures found in 60,000 synthetic genomes and 20,000 synthetic exomes. For simulations with 5% noise, reflecting high-quality datasets, SigProfilerExtractor outperforms other approaches by elucidating between 20% and 50% more true-positive signatures while yielding 5-fold less false-positive signatures. Applying SigProfilerExtractor to 4,643 whole-genome- and 19,184 whole-exome-sequenced cancers reveals four novel signatures. Two of the signatures are confirmed in independent cohorts, and one of these signatures is associated with tobacco smoking. In summary, this report provides a reference tool for analysis of mutational signatures, a comprehensive benchmarking of bioinformatics tools for extracting signatures, and several novel mutational signatures, including one putatively attributed to direct tobacco smoking mutagenesis in bladder tissues.
突变特征分析在癌症基因组研究中普遍开展。在此,我们展示了SigProfilerExtractor,这是一种用于提取突变特征的自动化工具,并通过使用34种情形对其与另外13种生物信息学工具进行基准测试,这些情形涵盖了在60000个合成基因组和20000个合成外显子组中发现的2500个模拟特征。对于具有5%噪声的模拟(反映高质量数据集),SigProfilerExtractor在阐明多20%至50%的真阳性特征的同时产生的假阳性特征减少5倍,从而优于其他方法。将SigProfilerExtractor应用于4643个全基因组测序和19184个全外显子组测序的癌症病例中发现了四个新特征。其中两个特征在独立队列中得到证实,并且这些特征之一与吸烟有关。总之,本报告提供了一个用于分析突变特征的参考工具、对用于提取特征的生物信息学工具的全面基准测试,以及几个新的突变特征,包括一个据推测归因于膀胱组织中直接吸烟诱变的特征。