高通量 DNA 测序数据中 SNP 调用的入门指南。

A beginners guide to SNP calling from high-throughput DNA-sequencing data.

机构信息

Statistical Genetics, Max Planck Institute of Psychiatry, Kraepelinstrasse 2-10, 80804 Munich, Germany.

出版信息

Hum Genet. 2012 Oct;131(10):1541-54. doi: 10.1007/s00439-012-1213-z. Epub 2012 Aug 11.

Abstract

High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.

摘要

高通量 DNA 测序（HTS）在生命科学中变得越来越重要。它最突出的应用之一是对整个基因组或基因组的靶向区域（例如所有外显子区域，即外显子组）进行测序。在这里，目标是识别遗传变异，如单核苷酸多态性（SNP）。从原始遗传序列中提取 SNP 需要涉及许多处理步骤和应用各种工具。我们回顾了从原始 HTS 数据中调用 SNP 的管道的基本构建块。该管道包括质量控制、短读段与参考基因组的映射、对齐的可视化和后处理，包括碱基质量重新校准。管道的最后步骤包括 SNP 调用过程以及 SNP 候选者的过滤。该管道的步骤伴随着对公开可用的全外显子组测序数据集的分析。为此，我们使用了几种对齐程序和 SNP 调用例程，以突出工具的选择会显著影响最终结果这一事实。

相似文献

A beginners guide to SNP calling from high-throughput DNA-sequencing data.

Hum Genet. 2012 Oct;131(10):1541-54. doi: 10.1007/s00439-012-1213-z. Epub 2012 Aug 11.

Impact of post-alignment processing in variant discovery from whole exome data.

BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.

Read trimming has minimal effect on bacterial SNP-calling accuracy.

Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000434. Epub 2020 Dec 11.

Review of alignment and SNP calling algorithms for next-generation sequencing data.

J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9.

WEP: a high-performance analysis pipeline for whole-exome data.

BMC Bioinformatics. 2013;14 Suppl 7(Suppl 7):S11. doi: 10.1186/1471-2105-14-S7-S11. Epub 2013 Apr 22.

ComB: SNP calling and mapping analysis for color and nucleotide space platforms.

J Comput Biol. 2011 Jun;18(6):795-807. doi: 10.1089/cmb.2011.0027. Epub 2011 May 12.

SNP calling by sequencing pooled samples.

BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

UGbS-Flex, a novel bioinformatics pipeline for imputation-free SNP discovery in polyploids without a reference genome: finger millet as a case study.

BMC Plant Biol. 2018 Jun 15;18(1):117. doi: 10.1186/s12870-018-1316-3.

Improving mapping and SNP-calling performance in multiplexed targeted next-generation sequencing.

BMC Genomics. 2012 Aug 22;13:417. doi: 10.1186/1471-2164-13-417.

An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

PLoS One. 2014 Jul 8;9(7):e101754. doi: 10.1371/journal.pone.0101754. eCollection 2014.

引用本文的文献

Discovery of variation in genes related to agronomic traits by sequencing the genome of Cucurbita pepo varieties.

BMC Genomics. 2025 Apr 3;26(1):335. doi: 10.1186/s12864-025-11370-x.

Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection.

BMC Genomics. 2024 Sep 3;25(1):827. doi: 10.1186/s12864-024-10737-w.

MIST: A microbial identification and source tracking system for next-generation sequencing data.

Imeta. 2023 Nov 2;2(4):e146. doi: 10.1002/imt2.146. eCollection 2023 Nov.

Investigating the potential roles of intra-colonial genetic variability in Pocillopora corals using genomics.

Sci Rep. 2024 Mar 18;14(1):6437. doi: 10.1038/s41598-024-57136-5.

Identification of eQTLs using different sets of single nucleotide polymorphisms associated with carcass and body composition traits in pigs.

BMC Genomics. 2024 Jan 2;25(1):14. doi: 10.1186/s12864-023-09863-8.

Potential Targeted Therapies in Ovarian Cancer.

Pharmaceuticals (Basel). 2022 Oct 26;15(11):1324. doi: 10.3390/ph15111324.

Innovative Approaches for Characterization of Genes and Proteins.

Front Genet. 2022 May 18;13:865182. doi: 10.3389/fgene.2022.865182. eCollection 2022.

Genome-wide evolutionary response of European oaks during the Anthropocene.

Evol Lett. 2022 Jan 5;6(1):4-20. doi: 10.1002/evl3.269. eCollection 2022 Feb.

Assessing Bos taurus introgression in the UOA Bos indicus assembly.

Genet Sel Evol. 2021 Dec 18;53(1):96. doi: 10.1186/s12711-021-00688-1.

Generalizable characteristics of false-positive bacterial variant calls.

Microb Genom. 2021 Aug;7(8). doi: 10.1099/mgen.0.000615.

本文引用的文献

GenomeView: a next-generation genome browser.

Nucleic Acids Res. 2012 Jan;40(2):e12. doi: 10.1093/nar/gkr995. Epub 2011 Nov 18.

Mouse genomic variation and its effect on phenotypes and gene regulation.

Nature. 2011 Sep 14;477(7364):289-94. doi: 10.1038/nature10413.

An integrated semiconductor device enabling non-optical genome sequencing.

Nature. 2011 Jul 20;475(7356):348-52. doi: 10.1038/nature10242.

The variant call format and VCFtools.

Bioinformatics. 2011 Aug 1;27(15):2156-8. doi: 10.1093/bioinformatics/btr330. Epub 2011 Jun 7.

SVA: software for annotating and visualizing sequenced human genomes.

Bioinformatics. 2011 Jul 15;27(14):1998-2000. doi: 10.1093/bioinformatics/btr317. Epub 2011 May 29.

Genotype and SNP calling from next-generation sequencing data.

Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986.

naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.

J Comput Biol. 2011 Mar;18(3):365-77. doi: 10.1089/cmb.2010.0247.

SHRiMP2: sensitive yet practical SHort Read Mapping.

Bioinformatics. 2011 Apr 1;27(7):1011-2. doi: 10.1093/bioinformatics/btr046. Epub 2011 Jan 28.

Quality control and preprocessing of metagenomic datasets.

Bioinformatics. 2011 Mar 15;27(6):863-4. doi: 10.1093/bioinformatics/btr026. Epub 2011 Jan 28.

Integrative genomics viewer.

Nat Biotechnol. 2011 Jan;29(1):24-6. doi: 10.1038/nbt.1754.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高通量 DNA 测序数据中 SNP 调用的入门指南。

A beginners guide to SNP calling from high-throughput DNA-sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献