Schrynemackers Marie, Wehenkel Louis, Babu M Madan, Geurts Pierre
Department of EE and CS & GIGA-R, University of Liège, Belgium.
Mol Biosyst. 2015 Aug;11(8):2116-25. doi: 10.1039/c5mb00174a.
Networks are ubiquitous in biology, and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate, theoretically and empirically, the exploitation of tree-based ensemble methods in the context of these two approaches for biological network inference. We first formalize the problem of network inference as a classification of pairs, unifying in the process homogeneous and bipartite graphs and discussing two main sampling schemes. We then present the global and the local approaches, extending the latter for the prediction of interactions between two unseen network nodes, and discuss their specializations to tree-based ensemble methods, highlighting their interpretability and drawing links with clustering techniques. Extensive computational experiments are carried out with these methods on various biological networks that clearly highlight that these methods are competitive with existing methods.
网络在生物学中无处不在,并且已经对其推理的计算方法进行了大量研究。特别是,监督式机器学习方法可用于通过整合各种测量来完成部分已知的网络。已经提出了两种主要的监督框架:局部方法,它为每个网络节点训练一个单独的模型;全局方法,它在节点对之上训练一个单一模型。在这里,我们从理论和实证两方面系统地研究了在这两种生物网络推理方法的背景下基于树的集成方法的应用。我们首先将网络推理问题形式化为对节点对的分类,在此过程中统一同构图和二分图,并讨论两种主要的采样方案。然后我们介绍全局和局部方法,扩展后者以预测两个未见过的网络节点之间的相互作用,并讨论它们对基于树的集成方法的专门化,突出它们的可解释性并与聚类技术建立联系。使用这些方法在各种生物网络上进行了广泛的计算实验,这些实验清楚地表明这些方法与现有方法具有竞争力。