Zheng Tian
Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, China.
Front Genet. 2022 Nov 22;13:981269. doi: 10.3389/fgene.2022.981269. eCollection 2022.
Mutation detecting is a routine work for sequencing data analysis and the trading of existing tools often involves the combinations of signals on a set of overlapped sequencing reads. However, the subclonal mutations, which are reported to contribute to tumor recurrence and metastasis, are sometimes eliminated by existing signals. When the clonal proportion decreases, signals often present ambiguous, while complicated interactions among signals break the IID assumption for most of the machine learning models. Although the mutation callers could lower the thresholds, false positives are significantly introduced. The main aim here was to detect the subclonal mutations with high specificity from the scenario of ambiguous sample purities or clonal proportions. We proposed a novel machine learning approach for filtering false positive calls to accurately detect mutations with wide spectrum subclonal proportion. We have carried out a series of experiments on both simulated and real datasets, and compared to several state-of-art approaches, including freebayes, MuTect2, Sentieon and SiNVICT. The results demonstrated that the proposed method adapts well to different diluted sequencing signals and can significantly reduce the false positive when detecting subclonal mutations. The codes have been uploaded at https://github.com/TrinaZ/TL-fpFilter for academic usage only.
突变检测是测序数据分析的一项常规工作,现有工具的交易通常涉及一组重叠测序读数上信号的组合。然而,据报道,亚克隆突变会导致肿瘤复发和转移,但有时会被现有信号消除。当克隆比例降低时,信号往往会变得模糊不清,而信号之间复杂的相互作用打破了大多数机器学习模型的独立同分布假设。尽管突变检测工具可以降低阈值,但会显著引入假阳性。这里的主要目的是在样本纯度或克隆比例不明确的情况下,高特异性地检测亚克隆突变。我们提出了一种新颖的机器学习方法,用于过滤假阳性调用,以准确检测具有广泛亚克隆比例的突变。我们在模拟数据集和真实数据集上都进行了一系列实验,并与几种先进方法进行了比较,包括freebayes、MuTect2、Sentieon和SiNVICT。结果表明,所提出的方法能很好地适应不同稀释程度的测序信号,并且在检测亚克隆突变时可以显著减少假阳性。代码已上传至https://github.com/TrinaZ/TL-fpFilter,仅用于学术用途。