Department of Information Technology, Ghent University - iMinds, 9050 Gent, Belgium.
Bioinformatics. 2013 May 15;29(10):1308-16. doi: 10.1093/bioinformatics/btt142. Epub 2013 Apr 16.
When genomic data are associated with gene expression data, the resulting expression quantitative trait loci (eQTL) will likely span multiple genes. eQTL prioritization techniques can be used to select the most likely causal gene affecting the expression of a target gene from a list of candidates. As an input, these techniques use physical interaction networks that often contain highly connected genes and unreliable or irrelevant interactions that can interfere with the prioritization process. We present EPSILON, an extendable framework for eQTL prioritization, which mitigates the effect of highly connected genes and unreliable interactions by constructing a local network before a network-based similarity measure is applied to select the true causal gene.
We tested the new method on three eQTL datasets derived from yeast data using three different association techniques. A physical interaction network was constructed, and each eQTL in each dataset was prioritized using the EPSILON approach: first, a local network was constructed using a k-trials shortest path algorithm, followed by the calculation of a network-based similarity measure. Three similarity measures were evaluated: random walks, the Laplacian Exponential Diffusion kernel and the Regularized Commute-Time kernel. The aim was to predict knockout interactions from a yeast knockout compendium. EPSILON outperformed two reference prioritization methods, random assignment and shortest path prioritization. Next, we found that using a local network significantly increased prioritization performance in terms of predicted knockout pairs when compared with using exactly the same network similarity measures on the global network, with an average increase in prioritization performance of 8 percentage points (P < 10(-5)).
The physical interaction network and the source code (Matlab/C++) of our implementation can be downloaded from http://bioinformatics.intec.ugent.be/epsilon.
lieven.verbeke@intec.ugent.be, kamar@psb.ugent.be, jan.fostier@intec.ugent.be
Supplementary data are available at Bioinformatics online.
当基因组数据与基因表达数据相关联时,由此产生的表达数量性状基因座(eQTL)很可能跨越多个基因。eQTL 优先级技术可用于从候选基因列表中选择最有可能影响靶基因表达的因果基因。作为输入,这些技术使用物理相互作用网络,这些网络通常包含高度连接的基因和不可靠或不相关的相互作用,这些相互作用可能会干扰优先级排序过程。我们提出了 EPSILON,这是一种用于 eQTL 优先级排序的可扩展框架,通过在应用基于网络的相似性度量来选择真正的因果基因之前构建局部网络,从而减轻高度连接基因和不可靠相互作用的影响。
我们使用三种不同的关联技术,在三个源自酵母数据的 eQTL 数据集上测试了新方法。构建了一个物理相互作用网络,然后使用 EPSILON 方法对每个数据集的每个 eQTL 进行优先级排序:首先,使用 k-试验最短路径算法构建一个局部网络,然后计算基于网络的相似性度量。评估了三种相似性度量:随机游走、拉普拉斯指数扩散核和正则化交换时间核。目的是从酵母敲除综合数据库中预测敲除相互作用。EPSILON 优于两种参考优先级排序方法,即随机分配和最短路径优先级排序。接下来,我们发现与在全局网络上使用完全相同的网络相似性度量相比,使用局部网络显著提高了预测敲除对的优先级排序性能,平均优先级排序性能提高了 8 个百分点(P<10(-5))。
物理相互作用网络和我们实现的源代码(Matlab/C++)可从 http://bioinformatics.intec.ugent.be/epsilon 下载。
lieven.verbeke@intec.ugent.be、kamar@psb.ugent.be、jan.fostier@intec.ugent.be
补充数据可在 Bioinformatics 在线获得。