McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
Nat Genet. 2018 Dec;50(12):1735-1743. doi: 10.1038/s41588-018-0257-y. Epub 2018 Nov 5.
Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability.
癌症基因组分析需要准确识别测序数据中的体细胞变异。在自动化处理后,需要进行手动审查以完善体细胞变异的调用,作为最后一步。然而,手动变异修正既耗时、昂贵、标准化程度低,又不可重现。在这里,我们使用机器学习方法对体细胞变异的修正进行了系统化和标准化。最终模型纳入了来自 440 个测序病例的 41,000 个变体。该模型准确地再现了三个独立测试集(13,579 个变体)的手动修正标签,并且准确地预测了正交验证测序数据中证实的体细胞变异(212,158 个变体)。该模型通过减少因高审查者间变异性而导致的变异调用偏差,从而改进了手动体细胞修正。