Huang Di, Ovcharenko Ivan
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA.
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA
Nucleic Acids Res. 2015 Jan;43(1):225-36. doi: 10.1093/nar/gku1318. Epub 2014 Dec 17.
Thousands of non-coding SNPs have been linked to human diseases in the past. The identification of causal alleles within this pool of disease-associated non-coding SNPs is largely impossible due to the inability to accurately quantify the impact of non-coding variation. To overcome this challenge, we developed a computational model that uses ChIP-seq intensity variation in response to non-coding allelic change as a proxy to the quantification of the biological role of non-coding SNPs. We applied this model to HepG2 enhancers and detected 4796 enhancer SNPs capable of disrupting enhancer activity upon allelic change. These SNPs are significantly over-represented in the binding sites of HNF4 and FOXA families of liver transcription factors and liver eQTLs. In addition, these SNPs are strongly associated with liver GWAS traits, including type I diabetes, and are linked to the abnormal levels of HDL and LDL cholesterol. Our model is directly applicable to any enhancer set for mapping causal regulatory SNPs.
过去,数以千计的非编码单核苷酸多态性(SNP)已与人类疾病相关联。由于无法准确量化非编码变异的影响,在这一与疾病相关的非编码SNP库中确定因果等位基因在很大程度上是不可能的。为了克服这一挑战,我们开发了一种计算模型,该模型使用染色质免疫沉淀测序(ChIP-seq)强度变化来响应非编码等位基因变化,以此作为量化非编码SNP生物学作用的替代指标。我们将此模型应用于肝癌细胞系(HepG2)增强子,并检测到4796个增强子SNP,这些SNP在等位基因变化时能够破坏增强子活性。这些SNP在肝脏转录因子HNF4和FOXA家族以及肝脏表达数量性状位点(eQTL)的结合位点中显著富集。此外,这些SNP与包括I型糖尿病在内的肝脏全基因组关联研究(GWAS)性状密切相关,并与高密度脂蛋白(HDL)和低密度脂蛋白(LDL)胆固醇的异常水平有关。我们的模型可直接应用于任何用于绘制因果调控SNP的增强子集合。