Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA.
Genome Res. 2010 Apr;20(4):537-45. doi: 10.1101/gr.100040.109. Epub 2010 Feb 11.
Next-generation sequencing technologies have made it possible to sequence targeted regions of the human genome in hundreds of individuals. Deep sequencing represents a powerful approach for the discovery of the complete spectrum of DNA sequence variants in functionally important genomic intervals. Current methods for single nucleotide polymorphism (SNP) detection are designed to detect SNPs from single individual sequence data sets. Here, we describe a novel method SNIP-Seq (single nucleotide polymorphism identification from population sequence data) that leverages sequence data from a population of individuals to detect SNPs and assign genotypes to individuals. To evaluate our method, we utilized sequence data from a 200-kilobase (kb) region on chromosome 9p21 of the human genome. This region was sequenced in 48 individuals (five sequenced in duplicate) using the Illumina GA platform. Using this data set, we demonstrate that our method is highly accurate for detecting variants and can filter out false SNPs that are attributable to sequencing errors. The concordance of sequencing-based genotype assignments between duplicate samples was 98.8%. The 200-kb region was independently sequenced to a high depth of coverage using two sequence pools containing the 48 individuals. Many of the novel SNPs identified by SNIP-Seq from the individual sequencing were validated by the pooled sequencing data and were subsequently confirmed by Sanger sequencing. We estimate that SNIP-Seq achieves a low false-positive rate of approximately 2%, improving upon the higher false-positive rate for existing methods that do not utilize population sequence data. Collectively, these results suggest that analysis of population sequencing data is a powerful approach for the accurate detection of SNPs and the assignment of genotypes to individual samples.
下一代测序技术使得对数百个人类基因组的靶向区域进行测序成为可能。深度测序是发现功能重要基因组间隔中完整 DNA 序列变异谱的强大方法。当前用于单核苷酸多态性 (SNP) 检测的方法旨在从单个个体序列数据集检测 SNP。在这里,我们描述了一种新的方法 SNIP-Seq(从群体序列数据中识别单核苷酸多态性),该方法利用来自个体群体的序列数据来检测 SNP 并为个体分配基因型。为了评估我们的方法,我们利用了人类基因组 9p21 染色体上 200 千碱基 (kb) 区域的序列数据。该区域使用 Illumina GA 平台在 48 个人(5 个重复测序)中进行了测序。使用该数据集,我们证明了我们的方法在检测变体方面非常准确,可以过滤掉归因于测序错误的假 SNP。重复样本之间基于测序的基因型分配的一致性为 98.8%。该 200-kb 区域使用包含 48 个人的两个序列池进行了深度测序。从个体测序中通过 SNIP-Seq 识别的许多新 SNP 通过池测序数据得到了验证,并随后通过 Sanger 测序得到了确认。我们估计 SNIP-Seq 的假阳性率约为 2%,低于不利用群体序列数据的现有方法的更高假阳性率。总体而言,这些结果表明,分析群体测序数据是一种准确检测 SNP 和为个体样本分配基因型的强大方法。