Department of Biostatistics, University of Michigan, Ann Arbor, 48109, USA.
Am J Hum Genet. 2010 Nov 12;87(5):604-17. doi: 10.1016/j.ajhg.2010.10.012.
Next Generation Sequencing Technology has revolutionized our ability to study the contribution of rare genetic variation to heritable traits. However, existing single-marker association tests are underpowered for detecting rare risk variants. A more powerful approach involves pooling methods that combine multiple rare variants from the same gene into a single test statistic. Proposed pooling methods can be limited because they generally assume high-quality genotypes derived from deep-coverage sequencing, which may not be available. In this paper, we consider an intuitive and computationally efficient pooling statistic, the cumulative minor-allele test (CMAT). We assess the performance of the CMAT and other pooling methods on datasets simulated with population genetic models to contain realistic levels of neutral variation. We consider study designs ranging from exon-only to whole-gene analyses that contain noncoding variants. For all study designs, the CMAT achieves power comparable to that of previously proposed methods. We then extend the CMAT to probabilistic genotypes and describe application to low-coverage sequencing and imputation data. We show that augmenting sequence data with imputed samples is a practical method for increasing the power of rare-variant studies. We also provide a method of controlling for confounding variables such as population stratification. Finally, we demonstrate that our method makes it possible to use external imputation templates to analyze rare variants imputed into existing GWAS datasets. As proof of principle, we performed a CMAT analysis of more than 8 million SNPs that we imputed into the GAIN psoriasis dataset by using haplotypes from the 1000 Genomes Project.
下一代测序技术极大地提高了我们研究稀有遗传变异对可遗传特征的贡献的能力。然而,现有的单标记关联测试在检测稀有风险变异方面能力不足。一种更强大的方法涉及到合并方法,即将来自同一基因的多个稀有变体合并为一个单一的测试统计量。提出的合并方法可能会受到限制,因为它们通常假设来自深度覆盖测序的高质量基因型,而这可能并不存在。在本文中,我们考虑了一种直观且计算效率高的合并统计量,累积少数等位基因测试(CMAT)。我们评估了 CMAT 和其他合并方法在模拟具有真实中性变异水平的群体遗传模型的数据集上的性能。我们考虑了从仅外显子到包含非编码变异的全基因分析的研究设计。对于所有的研究设计,CMAT 都能达到与先前提出的方法相当的功效。然后,我们将 CMAT 扩展到概率基因型,并描述了其在低覆盖测序和推断数据中的应用。我们表明,通过增加推断样本来扩充序列数据是提高稀有变异研究功效的一种实用方法。我们还提供了一种控制混杂变量(如群体分层)的方法。最后,我们证明了我们的方法可以使用外部推断模板来分析推断到现有 GWAS 数据集中的稀有变体。作为原理验证,我们对超过 800 万个 SNP 进行了 CMAT 分析,这些 SNP 是通过使用 1000 基因组计划中的单倍型来推断到 GAIN 银屑病数据集的。