New York Genome Center, New York, NY 10013, USA; Department of Biological Sciences, Columbia University, New York, NY 10027, USA.
Am J Hum Genet. 2014 Apr 3;94(4):559-73. doi: 10.1016/j.ajhg.2014.03.004.
Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWASs). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. I describe a statistical model that uses association statistics computed across the genome to identify classes of genomic elements that are enriched with or depleted of loci influencing a trait. The model naturally incorporates multiple types of annotations. I applied the model to GWASs of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, body mass index, and Crohn disease. For each trait, I used the model to evaluate the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over 100 tissues and cell lines. The fraction of phenotype-associated SNPs influencing protein sequence ranged from around 2% (for platelet volume) up to around 20% (for low-density lipoprotein cholesterol), repressed chromatin was significantly depleted for SNPs associated with several traits, and cell-type-specific DNase-I hypersensitive sites were enriched with SNPs associated with several traits (for example, the spleen in platelet volume). Finally, reweighting each GWAS by using information from functional genomics increased the number of loci with high-confidence associations by around 5%.
基因结构和调控元件的注释可以为全基因组关联研究 (GWAS) 提供信息。然而,选择用于解释特定性状关联研究的相关注释仍然具有挑战性。我描述了一种统计模型,该模型使用跨基因组计算的关联统计数据来识别基因组元素的类别,这些类别富含或缺乏影响性状的基因座。该模型自然地纳入了多种类型的注释。我将该模型应用于 18 个人类性状的 GWAS,包括红细胞性状、血小板性状、血糖水平、血脂水平、身高、体重指数和克罗恩病。对于每个性状,我使用该模型评估了 450 种不同基因组注释的相关性,包括蛋白质编码基因、增强子和 100 多种组织和细胞系中的 DNase-I 超敏位点。影响蛋白质序列的表型相关 SNP 的比例从大约 2%(对于血小板体积)到大约 20%(对于低密度脂蛋白胆固醇)不等,受抑制的染色质对于与多个性状相关的 SNP 显著耗尽,而细胞类型特异性的 DNase-I 超敏位点则富含与多个性状相关的 SNP(例如,在血小板体积中脾脏)。最后,通过使用功能基因组学信息对每个 GWAS 进行重新加权,将具有高置信度关联的基因座数量增加了约 5%。