Lee Jusang, Jo Kyuri, Lee Sunwon, Kang Jaewoo, Kim Sun
Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.
Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea.
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):477. doi: 10.1186/s12859-016-1335-8.
The primary goal of pathway analysis using transcriptome data is to find significantly perturbed pathways. However, pathway analysis is not always successful in identifying pathways that are truly relevant to the context under study. A major reason for this difficulty is that a single gene is involved in multiple pathways. In the KEGG pathway database, there are 146 genes, each of which is involved in more than 20 pathways. Thus activation of even a single gene will result in activation of many pathways. This complex relationship often makes the pathway analysis very difficult. While we need much more powerful pathway analysis methods, a readily available alternative way is to incorporate the literature information.
In this study, we propose a novel approach for prioritizing pathways by combining results from both pathway analysis tools and literature information. The basic idea is as follows. Whenever there are enough articles that provide evidence on which pathways are relevant to the context, we can be assured that the pathways are indeed related to the context, which is termed as relevance in this paper. However, if there are few or no articles reported, then we should rely on the results from the pathway analysis tools, which is termed as significance in this paper. We realized this concept as an algorithm by introducing Context Score and Impact Score and then combining the two into a single score. Our method ranked truly relevant pathways significantly higher than existing pathway analysis tools in experiments with two data sets.
Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. ContextTRAP will be a useful tool for the pathway based analysis of gene expression data since the user can specify the context of the biological experiment in a set of keywords. The web version of ContextTRAP is available at http://biohealth.snu.ac.kr/software/contextTRAP .
使用转录组数据进行通路分析的主要目标是找到显著受干扰的通路。然而,通路分析在识别与所研究背景真正相关的通路方面并不总是成功的。造成这一困难的一个主要原因是单个基因参与多个通路。在KEGG通路数据库中,有146个基因,每个基因都参与20多个通路。因此,即使单个基因的激活也会导致许多通路的激活。这种复杂的关系常常使通路分析变得非常困难。虽然我们需要更强大的通路分析方法,但一种现成的替代方法是纳入文献信息。
在本研究中,我们提出了一种通过结合通路分析工具的结果和文献信息来对通路进行优先级排序的新方法。基本思路如下。只要有足够的文章提供了关于哪些通路与该背景相关的证据,我们就可以确定这些通路确实与该背景相关,本文将其称为相关性。然而,如果报道的文章很少或没有,那么我们就应该依赖通路分析工具的结果,本文将其称为显著性。我们通过引入上下文得分和影响得分,然后将两者合并为一个单一得分,将这一概念实现为一种算法。在对两个数据集的实验中,我们的方法对真正相关的通路的排名明显高于现有的通路分析工具。
我们利用两个现有工具TRAP和BEST将我们的新框架实现为ContextTRAP。ContextTRAP将成为基于通路的基因表达数据分析的有用工具,因为用户可以在一组关键词中指定生物学实验的背景。ContextTRAP的网络版本可在http://biohealth.snu.ac.kr/software/contextTRAP获取。