Department of Computer Science, Stanford University, Stanford, California 94305, USA.
Genome Res. 2012 Sep;22(9):1748-59. doi: 10.1101/gr.136127.111.
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
全基因组关联研究已经成功地鉴定出与许多表型相关的单核苷酸多态性 (SNPs)。然而,一个相关的 SNP 很可能是连锁不平衡较大区域的一部分。这使得精确识别与表型具有生物学联系的 SNPs 变得困难。我们系统地研究了多种 ENCODE 数据与疾病相关 SNPs 的关联,结果表明,在当前鉴定的关联中,功能 SNPs 存在显著富集。当整合多种来源的功能信息以及使用最高置信度的疾病相关 SNPs 时,这种富集最为强烈。我们提出了一种整合 ENCODE 联盟生成的多种功能数据的方法,以帮助识别可能与疾病表型相关的“功能 SNPs”。我们的方法为多达 80%的先前报道的关联生成了可能的功能注释。我们表明,对于大多数关联,实验证据最有力支持的功能 SNP 是与报道的关联处于连锁不平衡的 SNP,而不是报道的 SNP 本身。我们的研究结果表明,ENCODE 联盟生成的实验数据集可成功用于为与疾病和其他表型相关的变异体提出功能假说。