Gordon Derek, Finch Stephen J, De La Vega Francisco M
Department of Genetics, Rutgers University, Piscataway, N.J., USA.
Hum Hered. 2011;71(2):113-25. doi: 10.1159/000325590. Epub 2011 Jul 6.
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results.
全基因组关联研究(GWAS)已成功鉴定出与疾病可重复相关的常见基因变异。然而,大多数相关变异带来的风险非常小,在对大型队列进行荟萃分析后,很大一部分预期遗传力仍无法解释。一种可能的解释是,目前GWAS使用SNP阵列未检测到的罕见变异在病例中出现时可能会导致很大一部分风险。这一概念激发了人们对探索罕见变异在疾病中的作用的极大兴趣。随着测序成本持续大幅下降,直接对病例对照样本进行测序以检测包括罕见变异在内的疾病关联变得可行。我们开发了一种检验统计量,可使用直接来自测序读数的数据在病例和对照之间进行关联测试。此外,我们的方法考虑了读数中的随机误差。我们使用期望最大化算法根据观察到的碱基对读数确定真实基因型判定的概率。我们应用SumStat程序为一组多个罕见变异位点获得单个统计量。我们通过模拟证明了我们方法的有效性。我们的结果表明,即使在序列读数存在差异错误分类的情况下,我们的统计量仍能保持正确的I型错误率,并且在多种情况下具有良好的功效。最后,我们的SumStat结果显示其功效至少与最大单一位点结果一样好。