Vembu Shankar, Morris Quaid
Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.
Pac Symp Biocomput. 2014:388-99.
Label propagation methods are extremely well-suited for a variety of biomedical prediction tasks based on network data. However, these algorithms cannot be used to integrate feature-based data sources with networks. We propose an efficient learning algorithm to integrate these two types of heterogeneous data sources to perform binary prediction tasks on node features (e.g., gene prioritization, disease gene prediction). Our method, LMGraph, consists of two steps. In the first step, we extract a small set of "network features" from the nodes of networks that represent connectivity with labeled nodes in the prediction tasks. In the second step, we apply a simple weighting scheme in conjunction with linear classifiers to combine these network features with other feature data. This two-step procedure allows us to (i) learn highly scalable and computationally efficient linear classifiers, (ii) and seamlessly combine feature-based data sources with networks. Our method is much faster than label propagation which is already known to be computationally efficient on large-scale prediction problems. Experiments on multiple functional interaction networks from three species (mouse, y, C.elegans) with tens of thousands of nodes and hundreds of binary prediction tasks demonstrate the efficacy of our method.
标签传播方法非常适合基于网络数据的各种生物医学预测任务。然而,这些算法不能用于将基于特征的数据源与网络进行整合。我们提出了一种高效的学习算法,将这两种类型的异构数据源进行整合,以便对节点特征执行二元预测任务(例如,基因优先级排序、疾病基因预测)。我们的方法LMGraph由两个步骤组成。第一步,我们从网络节点中提取一小部分“网络特征”,这些特征表示在预测任务中与标记节点的连通性。第二步,我们应用一个简单的加权方案并结合线性分类器,将这些网络特征与其他特征数据相结合。这个两步过程使我们能够:(i)学习高度可扩展且计算高效的线性分类器;(ii)无缝地将基于特征的数据源与网络相结合。我们的方法比标签传播快得多,而标签传播在大规模预测问题上已经被认为计算效率很高。在来自三个物种(小鼠、酵母、秀丽隐杆线虫)的具有数万个节点和数百个二元预测任务的多个功能相互作用网络上进行的实验证明了我们方法的有效性。