Suppr超能文献

TLsub:基于迁移学习的增强方法,用于准确检测具有广谱亚克隆比例的突变。

TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion.

作者信息

Zheng Tian

机构信息

Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.

Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, China.

出版信息

Front Genet. 2022 Nov 22;13:981269. doi: 10.3389/fgene.2022.981269. eCollection 2022.

Abstract

Mutation detecting is a routine work for sequencing data analysis and the trading of existing tools often involves the combinations of signals on a set of overlapped sequencing reads. However, the subclonal mutations, which are reported to contribute to tumor recurrence and metastasis, are sometimes eliminated by existing signals. When the clonal proportion decreases, signals often present ambiguous, while complicated interactions among signals break the IID assumption for most of the machine learning models. Although the mutation callers could lower the thresholds, false positives are significantly introduced. The main aim here was to detect the subclonal mutations with high specificity from the scenario of ambiguous sample purities or clonal proportions. We proposed a novel machine learning approach for filtering false positive calls to accurately detect mutations with wide spectrum subclonal proportion. We have carried out a series of experiments on both simulated and real datasets, and compared to several state-of-art approaches, including freebayes, MuTect2, Sentieon and SiNVICT. The results demonstrated that the proposed method adapts well to different diluted sequencing signals and can significantly reduce the false positive when detecting subclonal mutations. The codes have been uploaded at https://github.com/TrinaZ/TL-fpFilter for academic usage only.

摘要

突变检测是测序数据分析的一项常规工作,现有工具的交易通常涉及一组重叠测序读数上信号的组合。然而,据报道,亚克隆突变会导致肿瘤复发和转移,但有时会被现有信号消除。当克隆比例降低时,信号往往会变得模糊不清,而信号之间复杂的相互作用打破了大多数机器学习模型的独立同分布假设。尽管突变检测工具可以降低阈值,但会显著引入假阳性。这里的主要目的是在样本纯度或克隆比例不明确的情况下,高特异性地检测亚克隆突变。我们提出了一种新颖的机器学习方法,用于过滤假阳性调用,以准确检测具有广泛亚克隆比例的突变。我们在模拟数据集和真实数据集上都进行了一系列实验,并与几种先进方法进行了比较,包括freebayes、MuTect2、Sentieon和SiNVICT。结果表明,所提出的方法能很好地适应不同稀释程度的测序信号,并且在检测亚克隆突变时可以显著减少假阳性。代码已上传至https://github.com/TrinaZ/TL-fpFilter,仅用于学术用途。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1653/9723383/dc0a4096cb94/fgene-13-981269-g001.jpg

相似文献

1
TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion.
Front Genet. 2022 Nov 22;13:981269. doi: 10.3389/fgene.2022.981269. eCollection 2022.
2
SiNVICT: ultra-sensitive detection of single nucleotide variants and indels in circulating tumour DNA.
Bioinformatics. 2017 Jan 1;33(1):26-34. doi: 10.1093/bioinformatics/btw536. Epub 2016 Aug 16.
3
A machine learning framework for genotyping the structural variations with copy number variant.
BMC Med Genomics. 2020 Aug 27;13(Suppl 6):79. doi: 10.1186/s12920-020-00733-w.
4
CARE 2.0: reducing false-positive sequencing error corrections using machine learning.
BMC Bioinformatics. 2022 Jun 13;23(1):227. doi: 10.1186/s12859-022-04754-3.
5
A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting.
BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):137. doi: 10.1186/s12911-020-1117-0.
6
DETexT: An SNV detection enhancement for low read depth by integrating mutational signatures into TextCNN.
Front Genet. 2022 Sep 28;13:943972. doi: 10.3389/fgene.2022.943972. eCollection 2022.
7
The MOBSTER R package for tumour subclonal deconvolution from bulk DNA whole-genome sequencing data.
BMC Bioinformatics. 2020 Nov 17;21(1):531. doi: 10.1186/s12859-020-03863-1.
8
A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data.
Front Genet. 2020 Feb 27;10:1374. doi: 10.3389/fgene.2019.01374. eCollection 2019.

引用本文的文献

本文引用的文献

1
Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing.
Nat Biotechnol. 2021 Sep;39(9):1141-1150. doi: 10.1038/s41587-021-00994-5. Epub 2021 Sep 9.
2
A machine learning framework for genotyping the structural variations with copy number variant.
BMC Med Genomics. 2020 Aug 27;13(Suppl 6):79. doi: 10.1186/s12920-020-00733-w.
3
Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms.
Sci Rep. 2019 Dec 13;9(1):19123. doi: 10.1038/s41598-019-55636-3.
4
Structural variation in the sequencing era.
Nat Rev Genet. 2020 Mar;21(3):171-189. doi: 10.1038/s41576-019-0180-9. Epub 2019 Nov 15.
5
Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons' Data.
Cell Syst. 2019 Jul 24;9(1):24-34.e10. doi: 10.1016/j.cels.2019.06.006.
7
GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS.
Bioinformatics. 2018 Sep 1;34(17):3038-3040. doi: 10.1093/bioinformatics/bty303.
8
Impact of Tumor Purity on Immune Gene Expression and Clustering Analyses across Multiple Cancer Types.
Cancer Immunol Res. 2018 Jan;6(1):87-97. doi: 10.1158/2326-6066.CIR-17-0201. Epub 2017 Nov 15.
9
Indel variant analysis of short-read sequencing data with Scalpel.
Nat Protoc. 2016 Dec;11(12):2529-2548. doi: 10.1038/nprot.2016.150. Epub 2016 Nov 17.
10
SiNVICT: ultra-sensitive detection of single nucleotide variants and indels in circulating tumour DNA.
Bioinformatics. 2017 Jan 1;33(1):26-34. doi: 10.1093/bioinformatics/btw536. Epub 2016 Aug 16.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验