Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.
Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America.
PLoS Comput Biol. 2023 Jul 10;19(7):e1011286. doi: 10.1371/journal.pcbi.1011286. eCollection 2023 Jul.
Understanding the impact of regulatory variants on complex phenotypes is a significant challenge because the genes and pathways that are targeted by such variants and the cell type context in which regulatory variants operate are typically unknown. Cell-type-specific long-range regulatory interactions that occur between a distal regulatory sequence and a gene offer a powerful framework for examining the impact of regulatory variants on complex phenotypes. However, high-resolution maps of such long-range interactions are available only for a handful of cell types. Furthermore, identifying specific gene subnetworks or pathways that are targeted by a set of variants is a significant challenge. We have developed L-HiC-Reg, a Random Forests regression method to predict high-resolution contact counts in new cell types, and a network-based framework to identify candidate cell-type-specific gene networks targeted by a set of variants from a genome-wide association study (GWAS). We applied our approach to predict interactions in 55 Roadmap Epigenomics Mapping Consortium cell types, which we used to interpret regulatory single nucleotide polymorphisms (SNPs) in the NHGRI-EBI GWAS catalogue. Using our approach, we performed an in-depth characterization of fifteen different phenotypes including schizophrenia, coronary artery disease (CAD) and Crohn's disease. We found differentially wired subnetworks consisting of known as well as novel gene targets of regulatory SNPs. Taken together, our compendium of interactions and the associated network-based analysis pipeline leverages long-range regulatory interactions to examine the context-specific impact of regulatory variation in complex phenotypes.
理解调控变体对复杂表型的影响是一项重大挑战,因为这些变体靶向的基因和途径以及调控变体发挥作用的细胞类型背景通常是未知的。发生在远端调控序列和基因之间的细胞类型特异性长程调控相互作用为研究调控变体对复杂表型的影响提供了一个强大的框架。然而,这种长程相互作用的高分辨率图谱仅可用于少数几种细胞类型。此外,确定受一组变体靶向的特定基因子网或途径也是一项重大挑战。我们开发了 L-HiC-Reg,这是一种随机森林回归方法,可预测新细胞类型中的高分辨率接触计数,以及一种基于网络的框架,用于从全基因组关联研究 (GWAS) 中识别一组变体靶向的候选细胞类型特异性基因网络。我们将我们的方法应用于预测 55 个 Roadmap Epigenomics Mapping 联盟细胞类型中的相互作用,我们使用这些相互作用来解释 NHGRI-EBI GWAS 目录中的调控单核苷酸多态性 (SNP)。使用我们的方法,我们对包括精神分裂症、冠状动脉疾病 (CAD) 和克罗恩病在内的十五种不同表型进行了深入的特征描述。我们发现了由已知和新的调控 SNP 基因靶标组成的差异连接的子网。总之,我们的相互作用汇编和相关的基于网络的分析管道利用长程调控相互作用来研究调控变异在复杂表型中的特定于上下文的影响。