Department of Medical Oncology, Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, The Netherlands.
Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
Sci Rep. 2023 Jun 27;13(1):10424. doi: 10.1038/s41598-023-37409-1.
Next generation sequencing of cell-free DNA (cfDNA) is a promising method for treatment monitoring and therapy selection in metastatic breast cancer (MBC). However, distinguishing tumor-specific variants from sequencing artefacts and germline variation with low false discovery rate is challenging when using large targeted sequencing panels covering many tumor suppressor genes. To address this, we built a machine learning model to remove false positive variant calls and augmented it with additional filters to ensure selection of tumor-derived variants. We used cfDNA of 70 MBC patients profiled with both the small targeted Oncomine breast panel (Thermofisher) and the much larger Qiaseq Human Breast Cancer Panel (Qiagen). The model was trained on the panels' common regions using Oncomine hotspot mutations as ground truth. Applied to Qiaseq data, it achieved 35% sensitivity and 36% precision, outperforming basic filtering. For 20 patients we used germline DNA to filter for somatic variants and obtained 245 variants in total, while our model found seven variants, of which six were also detected using the germline strategy. In ten tumor-free individuals, our method detected in total one (potentially germline) variant, in contrast to 521 variants detected without our model. These results indicate that our model largely detects somatic variants.
基于游离细胞 DNA(cfDNA)的下一代测序技术是一种很有前途的方法,可用于转移性乳腺癌(MBC)的治疗监测和治疗选择。然而,当使用覆盖许多肿瘤抑制基因的大型靶向测序面板时,区分肿瘤特异性变异与测序伪影和种系变异并将假阳性率保持在较低水平是具有挑战性的。为了解决这个问题,我们构建了一个机器学习模型来去除假阳性变异,并用额外的过滤方法来确保选择肿瘤来源的变异。我们使用了 70 名 MBC 患者的 cfDNA,这些患者均使用小型靶向 Oncomine 乳腺癌面板(Thermofisher)和更大的 Qiaseq Human Breast Cancer Panel(Qiagen)进行了 profiling。该模型使用 Oncomine 热点突变作为地面实况,在面板的共同区域进行训练。应用于 Qiaseq 数据时,它的灵敏度为 35%,精度为 36%,优于基本过滤。对于 20 名患者,我们使用种系 DNA 过滤体细胞变异,总共获得了 245 个变异,而我们的模型发现了 7 个变异,其中 6 个也通过种系策略检测到。在 10 名无肿瘤个体中,我们的方法总共检测到一个(可能是种系)变异,而没有我们的模型则检测到 521 个变异。这些结果表明,我们的模型主要检测体细胞变异。