Department of Mathematics, Imperial College London, London, United Kingdom.
PLoS Genet. 2013;9(8):e1003657. doi: 10.1371/journal.pgen.1003657. Epub 2013 Aug 8.
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n > 100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.
全基因组关联研究(GWAS)在定义复杂性状和疾病的遗传结构方面取得了重大进展。尽管如此,GWAS 的一个主要障碍是将多个遗传关联缩小到少数几个用于功能研究的因果变体。在多表型 GWAS 中,由于 SNP 与性状之间的复杂连锁不平衡模式和性状之间的相关性,复杂 SNP-性状关联的检测和可解释性变得复杂,这一点尤为关键。在这里,我们提出了一种计算效率高的算法(GUESS)来探索复杂的遗传关联模型并最大限度地提高遗传变异的检测。我们将我们的算法与一种新的多表型分析贝叶斯策略相结合,以确定每个 SNP 对不同性状组合的特定贡献,并研究哥廷根健康研究(GHS)中的脂质代谢的遗传调控。尽管 GHS 的规模相对较小(n = 3175),与最大的已发表的荟萃 GWAS(n > 100,000)相比,GUESS 恢复了大多数主要关联,并且比其他方法更善于细化多性状关联。在 GUESS 提供的新发现中,我们揭示了 SORT1 与 TG-APOB 和 LIPC 与 TG-HDL 表型组之间的强烈关联,这些关联在更大的荟萃 GWAS 中被忽视,并且没有被竞争方法揭示,我们在两个独立的队列中复制了这些关联。此外,我们在模拟真实情况的模拟研究中展示了 GUESS 相对于替代多表型方法(包括贝叶斯和非贝叶斯方法)的更高功效。除了多表型的多元建模之外,我们的贝叶斯模型还采用了灵活的遗传效应层次先验结构,该结构适应于预测器的任何相关结构,并增加了识别相关变体的功效。这为分析多样化的基因组特征提供了一个强大的工具,例如包括基因表达和外显子组测序数据,其中预测器空间中存在复杂的相关性。