Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada.
Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC, USA.
Bioinformatics. 2019 May 1;35(9):1445-1452. doi: 10.1093/bioinformatics/bty812.
Accurate detection of somatic mutations is a crucial step toward understanding cancer. Various tools have been developed to detect somatic mutations from cancer genome sequencing data by mapping reads to a universal reference genome and inferring likelihoods from complex statistical models. However, read mapping is frequently obstructed by mismatches between germline and somatic mutations on a read and the reference genome. Previous attempts to develop personalized genome tools are not compatible with downstream statistical models for somatic mutation detection.
We present PRESM, a tool that builds personalized reference genomes by integrating germline mutations into the reference genome. The aforementioned obstacle is circumvented by using a two-step germline substitution procedure, maintaining positional fidelity using an innovative workaround. Reads derived from tumor tissue can be positioned more accurately along a personalized reference than a universal reference due to the reduced genetic distance between the subject (tumor genome) and the target (the personalized genome). Application of PRESM's personalized genome reduced false-positive (FP) somatic mutation calls by as much as 55.5%, and facilitated the discovery of a novel somatic point mutation on a germline insertion in PDE1A, a phosphodiesterase associated with melanoma. Moreover, all improvements in calling accuracy were achieved without parameter optimization, as PRESM itself is parameter-free. Hence, similar increases in read mapping and decreases in the FP rate will persist when PRESM-built genomes are applied to any user-provided dataset.
The software is available at https://github.com/precisionomics/PRESM.
Supplementary data are available at Bioinformatics online.
准确检测体细胞突变是理解癌症的关键步骤。已经开发了各种工具,通过将读取映射到通用参考基因组并从复杂的统计模型推断可能性,从癌症基因组测序数据中检测体细胞突变。然而,读取映射经常受到读取和参考基因组上种系和体细胞突变之间不匹配的阻碍。以前开发个性化基因组工具的尝试与用于体细胞突变检测的下游统计模型不兼容。
我们提出了 PRESM,这是一种通过将种系突变整合到参考基因组中来构建个性化参考基因组的工具。通过使用两步种系替换过程,同时使用创新的解决方法保持位置保真度,克服了上述障碍。由于主体(肿瘤基因组)和目标(个性化基因组)之间的遗传距离减小,来自肿瘤组织的读取可以更准确地沿着个性化参考基因组定位,而不是通用参考基因组。PRESM 的个性化基因组的应用减少了多达 55.5%的假阳性(FP)体细胞突变调用,并促成了在 PDE1A 上发现一种新的种系插入体细胞点突变,PDE1A 是一种与黑色素瘤相关的磷酸二酯酶。此外,所有提高呼叫准确性的改进都是在没有参数优化的情况下实现的,因为 PRESM 本身是无参数的。因此,当将 PRESM 构建的基因组应用于任何用户提供的数据集时,读取映射的类似增加和 FP 率的降低将持续存在。
该软件可在 https://github.com/precisionomics/PRESM 上获得。
补充数据可在生物信息学在线获得。