Maruvka Yosef E, Mouw Kent W, Karlic Rosa, Parasuraman Prasanna, Kamburov Atanas, Polak Paz, Haradhvala Nicholas J, Hess Julian M, Rheinbay Esther, Brody Yehuda, Koren Amnon, Braunstein Lior Z, D'Andrea Alan, Lawrence Michael S, Bass Adam, Bernards Andre, Michor Franziska, Getz Gad
Massachusetts General Hospital Center for Cancer Research, Charlestown, Massachusetts, USA.
Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.
Nat Biotechnol. 2017 Oct;35(10):951-959. doi: 10.1038/nbt.3966. Epub 2017 Sep 11.
Microsatellites (MSs) are tracts of variable-length repeats of short DNA motifs that exhibit high rates of mutation in the form of insertions or deletions (indels) of the repeated motif. Despite their prevalence, the contribution of somatic MS indels to cancer has been largely unexplored, owing to difficulties in detecting them in short-read sequencing data. Here we present two tools: MSMuTect, for accurate detection of somatic MS indels, and MSMutSig, for identification of genes containing MS indels at a higher frequency than expected by chance. Applying MSMuTect to whole-exome data from 6,747 human tumors representing 20 tumor types, we identified >1,000 previously undescribed MS indels in cancer genes. Additionally, we demonstrate that the number and pattern of MS indels can accurately distinguish microsatellite-stable tumors from tumors with microsatellite instability, thus potentially improving classification of clinically relevant subgroups. Finally, we identified seven MS indel driver hotspots: four in known cancer genes (ACVR2A, RNF43, JAK1, and MSH3) and three in genes not previously implicated as cancer drivers (ESRP1, PRDM2, and DOCK3).
微卫星(MSs)是短DNA基序的可变长度重复序列,以重复基序的插入或缺失(Indels)形式表现出高突变率。尽管它们普遍存在,但由于在短读长测序数据中检测它们存在困难,体细胞MS Indels对癌症的贡献在很大程度上尚未得到探索。在这里,我们展示了两种工具:用于准确检测体细胞MS Indels的MSMuTect和用于识别包含MS Indels的基因的MSMutSig,这些基因的频率高于偶然预期。将MSMuTect应用于来自代表20种肿瘤类型的6747个人类肿瘤的全外显子组数据,我们在癌症基因中鉴定出>1000个先前未描述的MS Indels。此外,我们证明MS Indels的数量和模式可以准确区分微卫星稳定肿瘤和微卫星不稳定肿瘤,从而潜在地改善临床相关亚组的分类。最后,我们确定了七个MS Indel驱动热点:四个在已知癌症基因(ACVR2A、RNF43、JAK1和MSH3)中,三个在以前未被认为是癌症驱动基因的基因(ESRP1、PRDM2和DOCK3)中。