Roche Sequencing Solutions, Santa Clara, CA, 95050, USA.
The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA.
Genome Biol. 2022 Jan 7;23(1):12. doi: 10.1186/s13059-021-02592-9.
Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data.
In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance.
The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions.
准确检测体细胞突变对于理解癌症的形成、进展和治疗至关重要,但这极具挑战性。我们最近提出了基于深度卷积神经网络的体细胞突变检测方法 NeuSomatic,并在模拟数据上展示了性能优势。
在这项研究中,我们使用了来自 SEQC2 联盟的第一个全面且特征良好的体细胞参考数据集,以研究在癌症突变检测中使用深度学习框架的最佳实践。我们利用该联盟为癌细胞系确定的高可信度体细胞突变,为代表真实情况的样本从多个数据集中构建稳健模型确定了最佳策略,例如,在真实突变和插入突变的组合上训练的模型具有最高的平均性能。
我们在研究中确定的策略在多种测序技术中具有高度的稳健性,适用于新鲜和 FFPE DNA 输入、不同的肿瘤/正常纯度以及不同的覆盖度,与传统检测方法相比具有显著优势,在低覆盖度、低变异等位基因频率、DNA 损伤和困难基因组区域等具有挑战性的情况下优势更为明显。