Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA.
Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
Bioinformatics. 2020 Feb 1;36(3):690-697. doi: 10.1093/bioinformatics/btz669.
Annotating a given genomic locus or a set of genomic loci is an important yet challenging task. This is especially true for the non-coding part of the genome which is enormous yet poorly understood. Since gene set enrichment analyses have demonstrated to be effective approach to annotate a set of genes, the same idea can be extended to explore the enrichment of functional elements or features in a set of genomic intervals to reveal potential functional connections.
In this study, we describe a novel computational strategy named loci2path that takes advantage of the newly emerged, genome-wide and tissue-specific expression quantitative trait loci (eQTL) information to help annotate a set of genomic intervals in terms of transcription regulation. By checking the presence or the absence of millions of eQTLs in a set of input genomic intervals, combined with grouping eQTLs by the pathways or gene sets that their target genes belong to, loci2path build a bridge connecting genomic intervals to functional pathways and pre-defined biological-meaningful gene sets, revealing potential for regulatory connection. Our method enjoys two key advantages over existing methods: first, we no longer rely on proximity to link a locus to a gene which has shown to be unreliable; second, eQTL allows us to provide the regulatory annotation under the context of specific tissue types. To demonstrate its utilities, we apply loci2path on sets of genomic intervals harboring disease-associated variants as query. Using 1 702 612 eQTLs discovered by the Genotype-Tissue Expression (GTEx) project across 44 tissues and 6320 pathways or gene sets cataloged in MSigDB as annotation resource, our method successfully identifies highly relevant biological pathways and revealed disease mechanisms for psoriasis and other immune-related diseases. Tissue specificity analysis of associated eQTLs provide additional evidence of the distinct roles of different tissues played in the disease mechanisms.
loci2path is published as an open source Bioconductor package, and it is available at http://bioconductor.org/packages/release/bioc/html/loci2path.html.
Supplementary data are available at Bioinformatics online.
注释给定的基因组基因座或一组基因组基因座是一项重要但具有挑战性的任务。对于基因组的非编码部分尤其如此,因为它非常庞大,但人们对其了解甚少。由于基因集富集分析已被证明是注释一组基因的有效方法,因此可以将相同的想法扩展到探索一组基因组间隔区中功能元件或特征的富集,以揭示潜在的功能联系。
在这项研究中,我们描述了一种名为 loci2path 的新计算策略,该策略利用新出现的、全基因组和组织特异性表达数量性状基因座(eQTL)信息来帮助根据转录调控对一组基因组间隔区进行注释。通过检查一组输入基因组间隔区中存在或不存在数百万个 eQTL,结合将 eQTL 按其靶基因所属的途径或基因集进行分组,loci2path 构建了连接基因组间隔区与功能途径和预定义具有生物学意义的基因集的桥梁,揭示了潜在的调节连接。与现有方法相比,我们的方法具有两个关键优势:首先,我们不再依赖于接近性将基因座与基因联系起来,因为这种方法已被证明是不可靠的;其次,eQTL 允许我们在特定组织类型的背景下提供调节注释。为了证明其效用,我们将 loci2path 应用于包含疾病相关变体的基因组间隔区集作为查询。使用由 Genotype-Tissue Expression (GTEx) 项目在 44 种组织和 MSigDB 中编目的 6320 个途径或基因集中发现的 1702612 个 eQTL 作为注释资源,我们的方法成功地识别了高度相关的生物学途径,并揭示了银屑病和其他免疫相关疾病的发病机制。相关 eQTL 的组织特异性分析为不同组织在疾病机制中发挥的不同作用提供了额外的证据。
loci2path 作为一个开源 Bioconductor 包发布,可在 http://bioconductor.org/packages/release/bioc/html/loci2path.html 上获得。
补充数据可在 Bioinformatics 在线获得。