利用群体测序数据进行 SNP 的精确检测和基因分型。

Accurate detection and genotyping of SNPs utilizing population sequencing data.

机构信息

Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA.

出版信息

Genome Res. 2010 Apr;20(4):537-45. doi: 10.1101/gr.100040.109. Epub 2010 Feb 11.

DOI:10.1101/gr.100040.109

PMID:20150320

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2847757/

Abstract

Next-generation sequencing technologies have made it possible to sequence targeted regions of the human genome in hundreds of individuals. Deep sequencing represents a powerful approach for the discovery of the complete spectrum of DNA sequence variants in functionally important genomic intervals. Current methods for single nucleotide polymorphism (SNP) detection are designed to detect SNPs from single individual sequence data sets. Here, we describe a novel method SNIP-Seq (single nucleotide polymorphism identification from population sequence data) that leverages sequence data from a population of individuals to detect SNPs and assign genotypes to individuals. To evaluate our method, we utilized sequence data from a 200-kilobase (kb) region on chromosome 9p21 of the human genome. This region was sequenced in 48 individuals (five sequenced in duplicate) using the Illumina GA platform. Using this data set, we demonstrate that our method is highly accurate for detecting variants and can filter out false SNPs that are attributable to sequencing errors. The concordance of sequencing-based genotype assignments between duplicate samples was 98.8%. The 200-kb region was independently sequenced to a high depth of coverage using two sequence pools containing the 48 individuals. Many of the novel SNPs identified by SNIP-Seq from the individual sequencing were validated by the pooled sequencing data and were subsequently confirmed by Sanger sequencing. We estimate that SNIP-Seq achieves a low false-positive rate of approximately 2%, improving upon the higher false-positive rate for existing methods that do not utilize population sequence data. Collectively, these results suggest that analysis of population sequencing data is a powerful approach for the accurate detection of SNPs and the assignment of genotypes to individual samples.

摘要

下一代测序技术使得对数百个人类基因组的靶向区域进行测序成为可能。深度测序是发现功能重要基因组间隔中完整 DNA 序列变异谱的强大方法。当前用于单核苷酸多态性 (SNP) 检测的方法旨在从单个个体序列数据集检测 SNP。在这里，我们描述了一种新的方法 SNIP-Seq（从群体序列数据中识别单核苷酸多态性），该方法利用来自个体群体的序列数据来检测 SNP 并为个体分配基因型。为了评估我们的方法，我们利用了人类基因组 9p21 染色体上 200 千碱基 (kb) 区域的序列数据。该区域使用 Illumina GA 平台在 48 个人（5 个重复测序）中进行了测序。使用该数据集，我们证明了我们的方法在检测变体方面非常准确，可以过滤掉归因于测序错误的假 SNP。重复样本之间基于测序的基因型分配的一致性为 98.8%。该 200-kb 区域使用包含 48 个人的两个序列池进行了深度测序。从个体测序中通过 SNIP-Seq 识别的许多新 SNP 通过池测序数据得到了验证，并随后通过 Sanger 测序得到了确认。我们估计 SNIP-Seq 的假阳性率约为 2%，低于不利用群体序列数据的现有方法的更高假阳性率。总体而言，这些结果表明，分析群体测序数据是一种准确检测 SNP 和为个体样本分配基因型的强大方法。

相似文献

Accurate detection and genotyping of SNPs utilizing population sequencing data.利用群体测序数据进行 SNP 的精确检测和基因分型。

Genome Res. 2010 Apr;20(4):537-45. doi: 10.1101/gr.100040.109. Epub 2010 Feb 11.

Evaluation of next generation sequencing platforms for population targeted sequencing studies.用于群体靶向测序研究的新一代测序平台评估

Genome Biol. 2009;10(3):R32. doi: 10.1186/gb-2009-10-3-r32. Epub 2009 Mar 27.

A statistical method for the detection of variants from next-generation resequencing of DNA pools.一种用于从 DNA 池的下一代重测序中检测变异的统计方法。

Bioinformatics. 2010 Jun 15;26(12):i318-24. doi: 10.1093/bioinformatics/btq214.

SNP detection for massively parallel whole-genome resequencing.用于大规模平行全基因组重测序的单核苷酸多态性检测

Genome Res. 2009 Jun;19(6):1124-32. doi: 10.1101/gr.088013.108. Epub 2009 May 6.

SNP detection and prediction of variability between chicken lines using genome resequencing of DNA pools.利用 DNA 池的基因组重测序检测和预测鸡系间的 SNP 变异。

BMC Genomics. 2010 Nov 25;11:665. doi: 10.1186/1471-2164-11-665.

A probabilistic method for the detection and genotyping of small indels from population-scale sequence data.一种基于概率方法的用于从人群规模序列数据中检测和分型小型插入缺失的方法。

Bioinformatics. 2011 Aug 1;27(15):2047-53. doi: 10.1093/bioinformatics/btr344. Epub 2011 Jun 7.

Dynamic variable selection in SNP genotype autocalling from APEX microarray data.基于APEX微阵列数据的SNP基因型自动分型中的动态变量选择

BMC Bioinformatics. 2006 Nov 30;7:521. doi: 10.1186/1471-2105-7-521.

Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries.通过简化代表性文库的下一代测序在亚麻中进行全基因组 SNP 发现。

BMC Genomics. 2012 Dec 6;13:684. doi: 10.1186/1471-2164-13-684.

Simple SNP-based minimal marker genotyping for Humulus lupulus L. identification and variety validation.基于单核苷酸多态性（SNP）的简易最小标记基因分型用于啤酒花鉴定和品种验证。

BMC Res Notes. 2015 Oct 6;8:542. doi: 10.1186/s13104-015-1492-2.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

引用本文的文献

Typing of by Region-Specific Extraction and Next-Generation Sequencing of the mitogenome.通过线粒体基因组的区域特异性提取和下一代测序进行分型。

Front Microbiol. 2025 Feb 28;16:1535628. doi: 10.3389/fmicb.2025.1535628. eCollection 2025.

Exploring the impact of sequence context on errors in SNP genotype calling with whole genome sequencing data using AI-based autoencoder approach.使用基于人工智能的自动编码器方法，利用全基因组测序数据探索序列上下文对单核苷酸多态性（SNP）基因型分型错误的影响。

NAR Genom Bioinform. 2024 Sep 24;6(3):lqae131. doi: 10.1093/nargab/lqae131. eCollection 2024 Sep.

Allele mining through TILLING and EcoTILLING approaches in vegetable crops.通过 TILLING 和 EcoTILLING 方法在蔬菜作物中进行等位基因挖掘。

Planta. 2023 Jun 13;258(1):15. doi: 10.1007/s00425-023-04176-2.

Diagnosis of cerebral malaria: Tools to reduce associated mortality.脑型疟疾的诊断：降低相关死亡率的工具。

Front Cell Infect Microbiol. 2023 Feb 9;13:1090013. doi: 10.3389/fcimb.2023.1090013. eCollection 2023.

The SNP rs7865618 of 9p21.3 locus emerges as the most promising marker of coronary artery disease in the southern Indian population.9p21.3 位点的 SNP rs7865618 成为了印度南部人群中冠心病最有前景的标志物。

Sci Rep. 2020 Dec 9;10(1):21511. doi: 10.1038/s41598-020-77080-4.

Multi-Trait Genomic Prediction of Yield-Related Traits in US Soft Wheat under Variable Water Regimes.在不同水分条件下美国软小麦产量相关性状的多性状基因组预测。

Genes (Basel). 2020 Oct 28;11(11):1270. doi: 10.3390/genes11111270.

Increased Prediction Accuracy Using Combined Genomic Information and Physiological Traits in A Soft Wheat Panel Evaluated in Multi-Environments.利用组合基因组信息和生理特征提高在多环境下评估的软质小麦群体的预测准确性。

Sci Rep. 2020 Apr 27;10(1):7023. doi: 10.1038/s41598-020-63919-3.

Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits.结构变异在复杂性状中表现出广泛的等位基因异质性和表型变异。

Nat Commun. 2019 Oct 25;10(1):4872. doi: 10.1038/s41467-019-12884-1.

Development of sequence-based markers for seed protein content in pigeonpea.基于序列的鸽豆种子蛋白含量标记的开发。

Mol Genet Genomics. 2019 Feb;294(1):57-68. doi: 10.1007/s00438-018-1484-8. Epub 2018 Sep 1.

Complex signatures of genomic variation of two non-model marine species in a homogeneous environment.两种均一环境下非模式海洋物种基因组变异的复杂特征。

BMC Genomics. 2018 May 9;19(1):347. doi: 10.1186/s12864-018-4721-y.

本文引用的文献

Genotype imputation.基因型推算

Annu Rev Genomics Hum Genet. 2009;10:387-406. doi: 10.1146/annurev.genom.9.081307.164242.

Methods for genomic partitioning.基因组分区方法。

Annu Rev Genomics Hum Genet. 2009;10:263-84. doi: 10.1146/annurev-genom-082908-150112.

Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.通过使用双碱基编码的短读长、大规模平行连接测序揭示的人类基因组中的序列和结构变异。

Genome Res. 2009 Sep;19(9):1527-41. doi: 10.1101/gr.091868.109. Epub 2009 Jun 22.

SOAP2: an improved ultrafast tool for short read alignment.SOAP2：一种用于短读序列比对的改进型超快速工具。

Bioinformatics. 2009 Aug 1;25(15):1966-7. doi: 10.1093/bioinformatics/btp336. Epub 2009 Jun 3.

Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

SNP detection for massively parallel whole-genome resequencing.用于大规模平行全基因组重测序的单核苷酸多态性检测

Genome Res. 2009 Jun;19(6):1124-32. doi: 10.1101/gr.088013.108. Epub 2009 May 6.

Evaluation of next generation sequencing platforms for population targeted sequencing studies.用于群体靶向测序研究的新一代测序平台评估

Genome Biol. 2009;10(3):R32. doi: 10.1186/gb-2009-10-3-r32. Epub 2009 Mar 27.

Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes.参与抗病毒反应的基因IFIH1的罕见变异可预防1型糖尿病。

Science. 2009 Apr 17;324(5925):387-9. doi: 10.1126/science.1167728. Epub 2009 Mar 5.

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。

Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.

DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.细胞遗传学正常的急性髓系白血病基因组的DNA测序

Nature. 2008 Nov 6;456(7218):66-72. doi: 10.1038/nature07485.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验