一种 SNP 发现方法，可从下一代重测序数据中评估变异等位基因的概率。

A SNP discovery method to assess variant allele probability from next-generation resequencing data.

机构信息

The Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.

出版信息

Genome Res. 2010 Feb;20(2):273-80. doi: 10.1101/gr.096388.109. Epub 2009 Dec 17.

DOI:10.1101/gr.096388.109

PMID:20019143

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2813483/

Abstract

Accurate identification of genetic variants from next-generation sequencing (NGS) data is essential for immediate large-scale genomic endeavors such as the 1000 Genomes Project, and is crucial for further genetic analysis based on the discoveries. The key challenge in single nucleotide polymorphism (SNP) discovery is to distinguish true individual variants (occurring at a low frequency) from sequencing errors (often occurring at frequencies orders of magnitude higher). Therefore, knowledge of the error probabilities of base calls is essential. We have developed Atlas-SNP2, a computational tool that detects and accounts for systematic sequencing errors caused by context-related variables in a logistic regression model learned from training data sets. Subsequently, it estimates the posterior error probability for each substitution through a Bayesian formula that integrates prior knowledge of the overall sequencing error probability and the estimated SNP rate with the results from the logistic regression model for the given substitutions. The estimated posterior SNP probability can be used to distinguish true SNPs from sequencing errors. Validation results show that Atlas-SNP2 achieves a false-positive rate of lower than 10%, with an approximately 5% or lower false-negative rate.

摘要

从下一代测序 (NGS) 数据中准确识别遗传变异对于像 1000 基因组计划这样的大规模基因组学研究至关重要，并且对于基于这些发现的进一步遗传分析也至关重要。单核苷酸多态性 (SNP) 发现的关键挑战是区分真正的个体变异（低频发生）和测序错误（高频发生）。因此，碱基调用错误概率的知识是必不可少的。我们开发了 Atlas-SNP2，这是一种计算工具，它可以通过从训练数据集学习的逻辑回归模型检测并解释与上下文相关变量相关的系统测序错误。随后，它通过贝叶斯公式估计每个替代的后验错误概率，该公式将整体测序错误概率和估计的 SNP 率的先验知识与给定替代的逻辑回归模型的结果相结合。估计的后验 SNP 概率可用于区分真正的 SNP 和测序错误。验证结果表明，Atlas-SNP2 的假阳性率低于 10%，假阴性率约为 5%或更低。

相似文献

A SNP discovery method to assess variant allele probability from next-generation resequencing data.

Genome Res. 2010 Feb;20(2):273-80. doi: 10.1101/gr.096388.109. Epub 2009 Dec 17.

SNP calling by sequencing pooled samples.

BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity.

BMC Bioinformatics. 2014 Apr 12;15:104. doi: 10.1186/1471-2105-15-104.

A statistical method for the detection of variants from next-generation resequencing of DNA pools.

Bioinformatics. 2010 Jun 15;26(12):i318-24. doi: 10.1093/bioinformatics/btq214.

Improving Single-Nucleotide Polymorphism-Based Fetal Fraction Estimation of Maternal Plasma Circulating Cell-Free DNA Using Bayesian Hierarchical Models.

J Comput Biol. 2018 Sep;25(9):1040-1049. doi: 10.1089/cmb.2018.0056. Epub 2018 Jun 22.

SNP detection for massively parallel whole-genome resequencing.

Genome Res. 2009 Jun;19(6):1124-32. doi: 10.1101/gr.088013.108. Epub 2009 May 6.

BM-SNP: A Bayesian Model for SNP Calling Using High Throughput Sequencing Data.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Nov-Dec;11(6):1038-44. doi: 10.1109/TCBB.2014.2321407.

Enriching targeted sequencing experiments for rare disease alleles.

Bioinformatics. 2011 Aug 1;27(15):2112-8. doi: 10.1093/bioinformatics/btr324. Epub 2011 Jun 23.

SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.

Genome Res. 2011 Jun;21(6):952-60. doi: 10.1101/gr.113084.110. Epub 2010 Oct 27.

Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

Hum Hered. 2012;74(3-4):172-83. doi: 10.1159/000346824. Epub 2013 Apr 11.

引用本文的文献

Methods to improve the accuracy of next-generation sequencing.

Front Bioeng Biotechnol. 2023 Jan 20;11:982111. doi: 10.3389/fbioe.2023.982111. eCollection 2023.

Human Retrotransposons and Effective Computational Detection Methods for Next-Generation Sequencing Data.

Life (Basel). 2022 Oct 12;12(10):1583. doi: 10.3390/life12101583.

Mutational Analysis of Triple-Negative Breast Cancer Using Targeted Kinome Sequencing.

J Breast Cancer. 2022 Jun;25(3):164-177. doi: 10.4048/jbc.2022.25.e15. Epub 2022 Apr 20.

Polerovirus genomic variation.

Virus Evol. 2021 Dec 4;7(2):veab102. doi: 10.1093/ve/veab102. eCollection 2021 Sep.

Oligonucleotide capture sequencing of the SARS-CoV-2 genome and subgenomic fragments from COVID-19 individuals.

PLoS One. 2021 Aug 25;16(8):e0244468. doi: 10.1371/journal.pone.0244468. eCollection 2021.

Sequencing of a central nervous system tumor demonstrates cancer transmission in an organ transplant.

Life Sci Alliance. 2021 Jul 22;4(9). doi: 10.26508/lsa.202000941. Print 2021 Sep.

Genome-Wide Variation in Betacoronaviruses.

J Virol. 2021 Jul 12;95(15):e0049621. doi: 10.1128/JVI.00496-21.

A study of transposable element-associated structural variations (TASVs) using a de novo-assembled Korean genome.

Exp Mol Med. 2021 Apr;53(4):615-630. doi: 10.1038/s12276-021-00586-y. Epub 2021 Apr 8.

Oligonucleotide Capture Sequencing of the SARS-CoV-2 Genome and Subgenomic Fragments from COVID-19 Individuals.

bioRxiv. 2020 Dec 11:2020.12.11.421057. doi: 10.1101/2020.12.11.421057.

Identification and Characterization of Base-Substitution Mutations in the Macronuclear Genome of the Ciliate Tetrahymena thermophila.

Genome Biol Evol. 2021 Jan 7;13(1). doi: 10.1093/gbe/evaa232.

本文引用的文献

VarScan: variant detection in massively parallel sequencing of individual and pooled samples.

Bioinformatics. 2009 Sep 1;25(17):2283-5. doi: 10.1093/bioinformatics/btp373. Epub 2009 Jun 19.

SNP detection for massively parallel whole-genome resequencing.

Genome Res. 2009 Jun;19(6):1124-32. doi: 10.1101/gr.088013.108. Epub 2009 May 6.

DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.

Nature. 2008 Nov 6;456(7218):66-72. doi: 10.1038/nature07485.

Next-generation DNA sequencing.

Nat Biotechnol. 2008 Oct;26(10):1135-45. doi: 10.1038/nbt1486.

Sequencing of natural strains of Arabidopsis thaliana with short reads.

Genome Res. 2008 Dec;18(12):2024-33. doi: 10.1101/gr.080200.108. Epub 2008 Sep 25.

Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.

High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies.

PLoS Genet. 2008 Aug 1;4(8):e1000139. doi: 10.1371/journal.pgen.1000139.

Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.

Nucleic Acids Res. 2008 Sep;36(16):e105. doi: 10.1093/nar/gkn425. Epub 2008 Jul 26.

The complete genome of an individual by massively parallel DNA sequencing.

Nature. 2008 Apr 17;452(7189):872-6. doi: 10.1038/nature06884.

The complete genome sequence of Escherichia coli DH10B: insights into the biology of a laboratory workhorse.

J Bacteriol. 2008 Apr;190(7):2597-606. doi: 10.1128/JB.01695-07. Epub 2008 Feb 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

一种 SNP 发现方法，可从下一代重测序数据中评估变异等位基因的概率。

A SNP discovery method to assess variant allele probability from next-generation resequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

一种 SNP 发现方法，可从下一代重测序数据中评估变异等位基因的概率。

A SNP discovery method to assess variant allele probability from next-generation resequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献