Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University.
Brief Bioinform. 2019 Jan 18;20(1):26-32. doi: 10.1093/bib/bbx094.
Genome-wide association studies (GWASs) are an effective strategy to identify susceptibility loci for human complex diseases. However, missing heritability is still a big problem. Most GWASs single-nucleotide polymorphisms (SNPs) are located in noncoding regions, which has been considered to be the unexplored territory of the genome. Recently, data from the Encyclopedia of DNA Elements (ENCODE) and Roadmap Epigenomics projects have shown that many GWASs SNPs in the noncoding regions fall within regulatory elements. In this study, we developed a pipeline named functional disease-associated SNPs prediction (FDSP), to identify novel susceptibility loci for complex diseases based on the interpretation of the functional features for known disease-associated variants with machine learning. We applied our pipeline to predict novel susceptibility SNPs for type 2 diabetes (T2D) and hypertension. The predicted SNPs could explain heritability beyond that explained by GWAS-associated SNPs. Functional annotation by expression quantitative trait loci analyses showed that the target genes of the predicted SNPs were significantly enriched in T2D or hypertension-related pathways in multiple tissues. Our results suggest that combining GWASs and regulatory features data could identify additional functional susceptibility SNPs for complex diseases. We hope FDSP could help to identify novel susceptibility loci for complex diseases and solve the missing heritability problem.
全基因组关联研究(GWAS)是识别人类复杂疾病易感基因座的有效策略。然而,遗传率缺失仍然是一个大问题。大多数 GWAS 单核苷酸多态性(SNP)位于非编码区域,这一直被认为是基因组中尚未开发的领域。最近,来自 DNA 元件百科全书(ENCODE)和表观基因组学项目的 Roadmap 数据表明,非编码区域中的许多 GWAS SNP 位于调控元件内。在这项研究中,我们开发了一个名为功能疾病相关 SNP 预测(FDSP)的管道,基于机器学习对已知疾病相关变体的功能特征进行解释,以识别复杂疾病的新易感基因座。我们将我们的管道应用于预测 2 型糖尿病(T2D)和高血压的新的易感 SNP。预测的 SNP 可以解释遗传率,超出了 GWAS 相关 SNP 所解释的遗传率。通过表达数量性状基因座分析进行的功能注释表明,预测 SNP 的靶基因在多个组织中明显富集在 T2D 或高血压相关途径中。我们的研究结果表明,将 GWAS 与调控特征数据相结合可以识别复杂疾病的其他功能易感 SNP。我们希望 FDSP 能够帮助识别复杂疾病的新易感基因座,并解决遗传率缺失的问题。