RareVar：一种用于检测低频单核苷酸变异的框架。

RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants.

作者信息

Hao Yangyang, Xuei Xiaoling, Li Lang, Nakshatri Harikrishna, Edenberg Howard J, Liu Yunlong

机构信息

1 Department of Medical and Molecular Genetics, Indiana University School of Medicine , Indianapolis, Indiana.

2 Center for Computational Biology and Bioinformatics, Indiana University School of Medicine , Indianapolis, Indiana.

出版信息

J Comput Biol. 2017 Jul;24(7):637-646. doi: 10.1089/cmb.2017.0057. Epub 2017 May 25.

DOI:10.1089/cmb.2017.0057

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5510701/

Abstract

Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.

摘要

准确识别肿瘤样本中的低频体细胞点突变具有重要的临床应用价值。尽管高通量测序技术能够在对原发性肿瘤样本进行测序时捕获此类变异，但当变异频率接近测序仪错误率时，我们的准确检测能力就会受到影响。目前大多数实验和生物信息学策略针对的是等位基因频率≥5%的突变，这限制了我们理解癌症病因和肿瘤进化的能力。我们提出了一个实验和计算建模框架RareVar，以在标准实验方案下从高通量测序数据中可靠地识别低频单核苷酸变异。RareVar方案包括一个基准设计，即通过混合来自已测序个体的不同浓度的DNA来靶向所需频率的变异，在我们的案例中为0.5%-3%。通过应用基于广义线性模型的位置特异性错误模型，随后进行基于机器学习的变异校准，我们的方法优于现有方法。我们的方法可以应用于大多数捕获和测序平台，而无需修改实验方案。

相似文献

1

RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants.

J Comput Biol. 2017 Jul;24(7):637-646. doi: 10.1089/cmb.2017.0057. Epub 2017 May 25.

2

Statistical modeling for sensitive detection of low-frequency single nucleotide variants.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):514. doi: 10.1186/s12864-016-2905-x.

3

SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.

BMC Genomics. 2016 Nov 14;17(1):912. doi: 10.1186/s12864-016-3281-2.

4

Machine learning random forest for predicting oncosomatic variant NGS analysis.

Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.

5

Accurately identifying low-allelic fraction variants in single samples with next-generation sequencing: applications in tumor subclone resolution.

Hum Mutat. 2013 Oct;34(10):1432-8. doi: 10.1002/humu.22365. Epub 2013 Jul 11.

6

SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5.

7

Optimizing an ion semiconductor sequencing data analysis method to identify somatic mutations in the genomes of cancer cells in clinical tissue samples.

Biomed Res. 2016;37(6):359-366. doi: 10.2220/biomedres.37.359.

8

OutLyzer: software for extracting low-allele-frequency tumor mutations from sequencing background noise in clinical practice.

Oncotarget. 2016 Nov 29;7(48):79485-79493. doi: 10.18632/oncotarget.13103.

9

Open-Sourced CIViC Annotation Pipeline to Identify and Annotate Clinically Relevant Variants Using Single-Molecule Molecular Inversion Probes.

JCO Clin Cancer Inform. 2019 Oct;3:1-12. doi: 10.1200/CCI.19.00077.

10

The use of technical replication for detection of low-level somatic mutations in next-generation sequencing.

Nat Commun. 2019 Mar 5;10(1):1047. doi: 10.1038/s41467-019-09026-y.

引用本文的文献

1

Cancer genomics and bioinformatics in Latin American countries: applications, challenges, and perspectives.

Front Oncol. 2025 Jul 9;15:1584178. doi: 10.3389/fonc.2025.1584178. eCollection 2025.

2

Identification of Somatic Mutations From Bulk and Single-Cell Sequencing Data.

Front Aging. 2022 Jan 3;2:800380. doi: 10.3389/fragi.2021.800380. eCollection 2021.

3

Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives.

Hum Genet. 2019 Feb;138(2):109-124. doi: 10.1007/s00439-019-01970-5. Epub 2019 Jan 22.

4

What Does This Mutation Mean? The Tools and Pitfalls of Variant Interpretation in Lymphoid Malignancies.

Int J Mol Sci. 2018 Apr 20;19(4):1251. doi: 10.3390/ijms19041251.

5

A system for detecting high impact-low frequency mutations in primary tumors and metastases.

Oncogene. 2018 Jan 11;37(2):185-196. doi: 10.1038/onc.2017.322. Epub 2017 Sep 11.

本文引用的文献

1

Statistical modeling for sensitive detection of low-frequency single nucleotide variants.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):514. doi: 10.1186/s12864-016-2905-x.

2

Detection of circulating tumor DNA in early- and late-stage human malignancies.

Sci Transl Med. 2014 Feb 19;6(224):224ra24. doi: 10.1126/scitranslmed.3007094.

3

Liquid biopsy: monitoring cancer-genetics in the blood.

Nat Rev Clin Oncol. 2013 Aug;10(8):472-84. doi: 10.1038/nrclinonc.2013.110. Epub 2013 Jul 9.

4

Characterizing and measuring bias in sequence data.

Genome Biol. 2013 May 29;14(5):R51. doi: 10.1186/gb-2013-14-5-r51.

5

Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data.

PLoS Comput Biol. 2013 Apr;9(4):e1003031. doi: 10.1371/journal.pcbi.1003031. Epub 2013 Apr 11.

6

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing.

Genome Med. 2013 Mar 27;5(3):28. doi: 10.1186/gm432. eCollection 2013.

7

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.

Nat Biotechnol. 2013 Mar;31(3):213-9. doi: 10.1038/nbt.2514. Epub 2013 Feb 10.

8

An integrated map of genetic variation from 1,092 human genomes.

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

9

LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets.

Nucleic Acids Res. 2012 Dec;40(22):11189-201. doi: 10.1093/nar/gks918. Epub 2012 Oct 12.

10

Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA.

Sci Transl Med. 2012 May 30;4(136):136ra68. doi: 10.1126/scitranslmed.3003726.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。