一种用于病例对照关联研究的新期望最大化统计检验，该研究考虑通过高通量测序获得的罕见变异。

A new expectation-maximization statistical test for case-control association studies considering rare variants obtained by high-throughput sequencing.

作者信息

Gordon Derek, Finch Stephen J, De La Vega Francisco M

机构信息

Department of Genetics, Rutgers University, Piscataway, N.J., USA.

出版信息

Hum Hered. 2011;71(2):113-25. doi: 10.1159/000325590. Epub 2011 Jul 6.

DOI:10.1159/000325590

PMID:21734402

Abstract

Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results.

摘要

全基因组关联研究（GWAS）已成功鉴定出与疾病可重复相关的常见基因变异。然而，大多数相关变异带来的风险非常小，在对大型队列进行荟萃分析后，很大一部分预期遗传力仍无法解释。一种可能的解释是，目前GWAS使用SNP阵列未检测到的罕见变异在病例中出现时可能会导致很大一部分风险。这一概念激发了人们对探索罕见变异在疾病中的作用的极大兴趣。随着测序成本持续大幅下降，直接对病例对照样本进行测序以检测包括罕见变异在内的疾病关联变得可行。我们开发了一种检验统计量，可使用直接来自测序读数的数据在病例和对照之间进行关联测试。此外，我们的方法考虑了读数中的随机误差。我们使用期望最大化算法根据观察到的碱基对读数确定真实基因型判定的概率。我们应用SumStat程序为一组多个罕见变异位点获得单个统计量。我们通过模拟证明了我们方法的有效性。我们的结果表明，即使在序列读数存在差异错误分类的情况下，我们的统计量仍能保持正确的I型错误率，并且在多种情况下具有良好的功效。最后，我们的SumStat结果显示其功效至少与最大单一位点结果一样好。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于病例对照关联研究的新期望最大化统计检验，该研究考虑通过高通量测序获得的罕见变异。

A new expectation-maximization statistical test for case-control association studies considering rare variants obtained by high-throughput sequencing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

一种用于病例对照关联研究的新期望最大化统计检验，该研究考虑通过高通量测序获得的罕见变异。

A new expectation-maximization statistical test for case-control association studies considering rare variants obtained by high-throughput sequencing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献