Nakka Priyanka, Raphael Benjamin J, Ramachandran Sohini
Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912 Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912.
Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912 Department of Computer Science, Brown University, Providence, Rhode Island 02912
Genetics. 2016 Oct;204(2):783-798. doi: 10.1534/genetics.116.188391. Epub 2016 Aug 3.
Genome-wide association (GWA) studies typically lack power to detect genotypes significantly associated with complex diseases, where different causal mutations of small effect may be present across cases. A common, tractable approach for identifying genomic elements associated with complex traits is to evaluate combinations of variants in known pathways or gene sets with shared biological function. Such gene-set analyses require the computation of gene-level P-values or gene scores; these gene scores are also useful when generating hypotheses for experimental validation. However, commonly used methods for generating GWA gene scores are computationally inefficient, biased by gene length, imprecise, or have low true positive rate (TPR) at low false positive rates (FPR), leading to erroneous hypotheses for functional validation. Here we introduce a new method, PEGASUS, for analytically calculating gene scores. PEGASUS produces gene scores with as much as 10 orders of magnitude higher numerical precision than competing methods. In simulation, PEGASUS outperforms existing methods, achieving up to 30% higher TPR when the FPR is fixed at 1%. We use gene scores from PEGASUS as input to HotNet2 to identify networks of interacting genes associated with multiple complex diseases and traits; this is the first application of HotNet2 to common variation. In ulcerative colitis and waist-hip ratio, we discover networks that include genes previously associated with these phenotypes, as well as novel candidate genes. In contrast, existing methods fail to identify these networks. We also identify networks for attention-deficit/hyperactivity disorder, in which GWA studies have yet to identify any significant SNPs.
全基因组关联(GWA)研究通常缺乏检测与复杂疾病显著相关基因型的能力,在复杂疾病中,不同的小效应因果突变可能存在于不同病例中。一种常见且易于处理的识别与复杂性状相关基因组元件的方法是评估已知途径或具有共享生物学功能的基因集中变体的组合。这种基因集分析需要计算基因水平的P值或基因分数;这些基因分数在为实验验证生成假设时也很有用。然而,常用的生成GWA基因分数的方法计算效率低下,受基因长度影响存在偏差,不够精确,或者在低假阳性率(FPR)下真阳性率(TPR)较低,从而导致功能验证的错误假设。在这里,我们引入了一种新方法PEGASUS,用于分析计算基因分数。PEGASUS产生的基因分数在数值精度上比竞争方法高出多达10个数量级。在模拟中,PEGASUS优于现有方法,当FPR固定在1%时,TPR提高了30%。我们将PEGASUS的基因分数作为输入提供给HotNet2,以识别与多种复杂疾病和性状相关的相互作用基因网络;这是HotNet2首次应用于常见变异。在溃疡性结肠炎和腰臀比方面,我们发现了包含先前与这些表型相关的基因以及新的候选基因的网络。相比之下,现有方法未能识别出这些网络。我们还识别出了注意力缺陷多动障碍的网络,在该疾病中,GWA研究尚未识别出任何显著的单核苷酸多态性(SNP)。