Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Nat Commun. 2021 Jun 7;12(1):3394. doi: 10.1038/s41467-021-23134-8.
The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.
大多数通过 GWAS 确定的变体是非编码的,这促使我们详细描述非编码变体的功能。通过直接干扰来评估变体在天然染色质环境中对基因表达影响的实验方法是低通量的。因此,现有的高通量计算预测器缺乏用于训练和验证的大型监管变体黄金标准集。在这里,我们利用通过统计精细映射获得的 14807 个人类潜在因果性 eQTL 集,并使用 6121 个特征直接训练变体是否改变附近基因表达的预测器。我们将得到的预测称为表达修饰得分 (EMS)。我们通过比较其优先考虑功能变体的能力与其他主要分数来验证 EMS。然后,我们将 EMS 用作 eQTL 统计精细映射的先验,以鉴定另外 20913 个潜在因果性 eQTL,并将 EMS 纳入共定位分析,以鉴定英国生物库表型中的 310 个额外候选基因。