Kuzmanovski Vladimir, Todorovski Ljupco, Džeroski Sašo
Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia.
Faculty of Public Administration, University of Ljubljana, Gosarjeva ulica 5, 1000 Ljubljana, Slovenia.
Gigascience. 2018 Nov 1;7(11):giy118. doi: 10.1093/gigascience/giy118.
The generalized relevance network approach to network inference reconstructs network links based on the strength of associations between data in individual network nodes. It can reconstruct undirected networks, i.e., relevance networks, sensu stricto, as well as directed networks, referred to as causal relevance networks. The generalized approach allows the use of an arbitrary measure of pairwise association between nodes, an arbitrary scoring scheme that transforms the associations into weights of the network links, and a method for inferring the directions of the links. While this makes the approach powerful and flexible, it introduces the challenge of finding a combination of components that would perform well on a given inference task.
We address this challenge by performing an extensive empirical analysis of the performance of 114 variants of the generalized relevance network approach on 47 tasks of gene network inference from time-series data and 39 tasks of gene network inference from steady-state data. We compare the different variants in a multi-objective manner, considering their ranking in terms of different performance metrics. The results suggest a set of recommendations that provide guidance for selecting an appropriate variant of the approach in different data settings.
The association measures based on correlation, combined with a particular scoring scheme of asymmetric weighting, lead to optimal performance of the relevance network approach in the general case. In the two special cases of inference tasks involving short time-series data and/or large networks, association measures based on identifying qualitative trends in the time series are more appropriate.
网络推断的广义相关网络方法基于各个网络节点中数据之间关联的强度来重建网络链接。它可以重建无向网络,即狭义上的相关网络,以及有向网络,即因果相关网络。广义方法允许使用节点之间成对关联的任意度量、将关联转换为网络链接权重的任意评分方案以及推断链接方向的方法。虽然这使得该方法强大且灵活,但它带来了一个挑战,即要找到在给定推断任务上表现良好的组件组合。
我们通过对广义相关网络方法的114种变体在从时间序列数据进行基因网络推断的47项任务和从稳态数据进行基因网络推断的39项任务上的性能进行广泛的实证分析,来应对这一挑战。我们以多目标方式比较不同的变体,考虑它们在不同性能指标方面的排名。结果提出了一组建议,为在不同数据设置中选择该方法的合适变体提供指导。
基于相关性的关联度量,结合特定的非对称加权评分方案,在一般情况下会导致相关网络方法的最优性能。在涉及短时间序列数据和/或大型网络的推断任务的两种特殊情况下,基于识别时间序列中定性趋势的关联度量更为合适。