Department of Integrative Biology, UC Berkeley, Berkeley, California 94720, USA.
Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.
Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost-effective protocols for using next-generation sequencing in association mapping studies based on pooled and un-pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon-capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next-generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next-generation sequencing.
人类最常见的遗传性疾病是复杂的多因素疾病。基于 SNP 基因分型的大规模全基因组关联研究仅鉴定出这些疾病遗传变异的一小部分。一种解释可能是,许多罕见的变异体(次要等位基因频率,MAF <5%),这些变异体未包含在常见的基因分型平台中,可能对这些疾病的遗传变异有很大的贡献。下一代测序技术可以分析罕见的变异体,现在价格便宜到足以成为 SNP 基因分型的可行替代方法。在本文中,我们提出了基于混合和非混合样本的关联图谱研究中使用下一代测序的具有成本效益的协议,并确定了总个体数量、每个池个体数量和测序覆盖度方面的最佳设计。我们进行了一项小型实证研究,以评估在将混合与外显子捕获结合使用的现实环境中混合的方差。为了检测关联,我们开发了一种似然比统计量,该统计量考虑了下一代测序数据的高错误率。我们还进行了广泛的模拟,以确定该方法的功效和准确性。总体而言,我们的研究结果表明,在固定成本的情况下,以更大的池大小和更浅的深度对许多个体进行测序,比以较小的池大小和更深的深度对少数个体进行测序具有更高的功效,即使存在高错误率也是如此。我们的研究结果为正在基于下一代测序开展关联图谱研究的研究人员提供了指导。