Ning Kaida, Gettler Kyle, Zhang Wei, Ng Sok Meng, Bowen B Monica, Hyams Jeffrey, Stephens Michael C, Kugathasan Subra, Denson Lee A, Schadt Eric E, Hoffman Gabriel E, Cho Judy H
Department of Genetics and Genomic Sciences.
Department of Genetics, Yale University, New Haven, CT 06520, USA.
Hum Mol Genet. 2015 Jul 15;24(14):4147-57. doi: 10.1093/hmg/ddv142. Epub 2015 May 1.
Genome-wide association studies in Crohn's disease (CD) have identified 140 genome-wide significant loci. However, identification of genes driving association signals remains challenging. Furthermore, genome-wide significant thresholds limit false positives at the expense of decreased sensitivity. In this study, we explored gene features contributing to CD pathogenicity, including gene-based association data from CD and autoimmune (AI) diseases, as well as gene expression features (eQTLs, epigenetic markers of expression and intestinal gene expression data). We developed an integrative model based on a CD reference gene set. This integrative approach outperformed gene-based association signals alone in identifying CD-related genes based on statistical validation, gene ontology enrichment, differential expression between M1 and M2 macrophages and a validation using genes causing monogenic forms of inflammatory bowel disease as a reference. Besides gene-level CD association P-values, association with AI diseases was the strongest predictor, highlighting generalized mechanisms of inflammation, and the interferon-γ pathway particularly. Within the 140 high-confidence CD regions, 598 of 1328 genes had low prioritization scores, highlighting genes unlikely to contribute to CD pathogenesis. For select regions, comparably high integrative model scores were observed for multiple genes. This is particularly evident for regions having extensive linkage disequilibrium such as the IBD5 locus. Our analyses provide a standardized reference for prioritizing potential CD-related genes, in regions with both highly significant and nominally significant gene-level association P-values. Our integrative model may be particularly valuable in prioritizing rare, potentially private, missense variants for which genome-wide evidence for association may be unattainable.
克罗恩病(CD)的全基因组关联研究已确定了140个全基因组显著位点。然而,确定驱动关联信号的基因仍然具有挑战性。此外,全基因组显著阈值以降低敏感性为代价限制了假阳性。在本研究中,我们探索了促成CD致病性的基因特征,包括来自CD和自身免疫性(AI)疾病的基于基因的关联数据,以及基因表达特征(表达数量性状基因座、表达的表观遗传标记和肠道基因表达数据)。我们基于CD参考基因集开发了一种综合模型。基于统计验证、基因本体富集、M1和M2巨噬细胞之间的差异表达以及使用导致单基因形式炎症性肠病的基因作为参考进行验证,这种综合方法在识别CD相关基因方面优于单独的基于基因的关联信号。除了基因水平的CD关联P值外,与AI疾病的关联是最强的预测指标,突出了炎症的普遍机制,尤其是干扰素-γ途径。在140个高置信度的CD区域内,1328个基因中的598个具有低优先级分数,突出了不太可能促成CD发病机制的基因。对于选定区域,多个基因观察到了相当高的综合模型分数。这在具有广泛连锁不平衡的区域如IBD5位点尤为明显。我们的分析为在具有高度显著和名义上显著的基因水平关联P值的区域中对潜在的CD相关基因进行优先级排序提供了标准化参考。我们的综合模型在对罕见的、潜在私人的错义变异进行优先级排序方面可能特别有价值,对于这些变异可能无法获得全基因组关联证据。