Sanjak Jaleal S, Long Anthony D, Thornton Kevin R
Department of Ecology and Evolutionary Biology, University of California Irvine, California 92697 Center for Complex Biological Systems, University of California Irvine, California 92697
Department of Ecology and Evolutionary Biology, University of California Irvine, California 92697 Center for Complex Biological Systems, University of California Irvine, California 92697.
G3 (Bethesda). 2016 Apr 7;6(4):1023-30. doi: 10.1534/g3.115.026013.
Genome-wide association studies (GWAS) have associated many single variants with complex disease, yet the better part of heritable complex disease risk remains unexplained. Analytical tools designed to work under specific population genetic models are needed. Rare variants are increasingly shown to be important in human complex disease, but most existing GWAS data do not cover rare variants. Explicit population genetic models predict that genes contributing to complex traits and experiencing recurrent, unconditionally deleterious, mutation will harbor multiple rare, causative mutations of subtle effect. It is difficult to identify genes harboring rare variants of large effect that contribute to complex disease risk via the single marker association tests typically used in GWAS. Gene/region-based association tests may have the power detect associations by combining information from multiple markers, but have yielded limited success in practice. This is partially because many methods have not been widely applied. Here, we empirically demonstrate the utility of a procedure based on the rank truncated product (RTP) method, filtered to reduce the effects of linkage disequilibrium. We apply the procedure to the Wellcome Trust Case Control Consortium (WTCCC) data set, and uncover previously unidentified associations, some of which have been replicated in much larger studies. We show that, in the absence of significant rare variant coverage, RTP based methods still have the power to detect associated genes. We recommend that RTP-based methods be applied to all existing GWAS data to maximize the usefulness of those data. For this, we provide efficient software implementing our procedure.
全基因组关联研究(GWAS)已将许多单核苷酸变异与复杂疾病联系起来,但遗传性复杂疾病风险的大部分仍无法解释。需要设计在特定群体遗传模型下工作的分析工具。越来越多的研究表明,罕见变异在人类复杂疾病中起着重要作用,但大多数现有的GWAS数据并未涵盖罕见变异。明确的群体遗传模型预测,对复杂性状有贡献且经历反复、无条件有害突变的基因将携带多个具有微小效应的罕见致病突变。通过GWAS中通常使用的单标记关联测试,很难识别出携带对复杂疾病风险有贡献的大效应罕见变异的基因。基于基因/区域的关联测试可能有能力通过整合多个标记的信息来检测关联,但在实践中取得的成功有限。部分原因是许多方法尚未得到广泛应用。在这里,我们通过实证证明了一种基于秩截断乘积(RTP)方法并经过过滤以减少连锁不平衡影响的程序的实用性。我们将该程序应用于威康信托病例对照研究联盟(WTCCC)数据集,发现了以前未识别的关联,其中一些在规模大得多的研究中得到了重复验证。我们表明,在没有显著罕见变异覆盖的情况下,基于RTP的方法仍然有能力检测相关基因。我们建议将基于RTP的方法应用于所有现有的GWAS数据,以最大限度地提高这些数据的有用性。为此,我们提供了实现我们程序的高效软件。