Vanderbilt Epidemiology Center, Division of Epidemiology, Department of Medicine, Vanderbilt University, Nashville, TN 37203, USA.
Bioinformatics. 2011 Aug 1;27(15):2112-8. doi: 10.1093/bioinformatics/btr324. Epub 2011 Jun 23.
Next-generation targeted resequencing of genome-wide association study (GWAS)-associated genomic regions is a common approach for follow-up of indirect association of common alleles. However, it is prohibitively expensive to sequence all the samples from a well-powered GWAS study with sufficient depth of coverage to accurately call rare genotypes. As a result, many studies may use next-generation sequencing for single nucleotide polymorphism (SNP) discovery in a smaller number of samples, with the intent to genotype candidate SNPs with rare alleles captured by resequencing. This approach is reasonable, but may be inefficient for rare alleles if samples are not carefully selected for the resequencing experiment.
We have developed a probability-based approach, SampleSeq, to select samples for a targeted resequencing experiment that increases the yield of rare disease alleles substantially over random sampling of cases or controls or sampling based on genotypes at associated SNPs from GWAS data. This technique allows for smaller sample sizes for resequencing experiments, or allows the capture of rarer risk alleles. When following up multiple regions, SampleSeq selects subjects with an even representation of all the regions. SampleSeq also can be used to calculate the sample size needed for the resequencing to increase the chance of successful capture of rare alleles of desired frequencies.
对全基因组关联研究 (GWAS) 相关基因组区域进行下一代靶向重测序是后续研究常见等位基因间接关联的常用方法。然而,对具有足够覆盖深度的全基因组关联研究中的所有样本进行测序以准确调用罕见基因型的费用非常昂贵。因此,许多研究可能会在少数样本中使用下一代测序进行单核苷酸多态性 (SNP) 发现,目的是对通过重测序捕获的稀有等位基因的候选 SNP 进行基因分型。这种方法是合理的,但如果不对重测序实验进行仔细选择,对于稀有等位基因来说可能效率不高。
我们开发了一种基于概率的方法 SampleSeq,用于选择靶向重测序实验的样本,与从 GWAS 数据中基于关联 SNP 的基因型随机抽样或病例对照抽样相比,这种方法大大增加了罕见疾病等位基因的产量。这种技术允许对重测序实验进行更小的样本量,或者允许捕获更罕见的风险等位基因。在跟踪多个区域时,SampleSeq 选择具有所有区域均匀代表性的受试者。SampleSeq 还可用于计算重测序所需的样本量,以增加捕获所需频率的稀有等位基因的机会。