Division of Biostatistics, City of Hope, Duarte, California, United States of America.
PLoS One. 2010 Dec 20;5(12):e14318. doi: 10.1371/journal.pone.0014318.
We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease.
我们描述了三个在病例对照遗传关联测试中发现有用的统计结果。这三个结果都涉及到将新的遗传变异的发现(通常通过测序)与识别先前发现的变异的基因分型方法结合起来。我们首先考虑通过在病例中集中进行变异发现来扩展已知变异的列表。虽然仅包含病例的测序数据的简单包含会产生偏差,但我们表明,即使未对对照进行测序,也可以保留一些测序数据。此外,对于中等频率的等位基因,与将相同的测序工作分配给病例和对照相比,仅对病例进行测序并进行偏差校正几乎不会损失任何功效。其次,我们研究了更集中的变异发现,以获得更多与疾病相关的变异的富集。我们展示了病例状态、家族史和标记共享如何通过与外显率相乘的增量来丰富发现集,从而能够优先发现高外显率的变异。第三个结果适用于测序是病例和对照中计数等位基因的主要手段,但使用补充的 pooled genotyping 样本来识别非常罕见的变异。我们表明这不会引起有效性问题,并且我们评估了一种更经济且更适应的方法来判断稀有性,基于特定群体的变异。我们证明了一个重要且不寻常的警告,即这种方法需要有效性的相等样本量。这三个结果可以用于更有效地检测罕见遗传变异与疾病的关联。