Suppr超能文献

一种整合网络和属性数据以进行基因功能预测的高效算法。

An efficient algorithm to integrate network and attribute data for gene function prediction.

作者信息

Vembu Shankar, Morris Quaid

机构信息

Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.

出版信息

Pac Symp Biocomput. 2014:388-99.

Abstract

Label propagation methods are extremely well-suited for a variety of biomedical prediction tasks based on network data. However, these algorithms cannot be used to integrate feature-based data sources with networks. We propose an efficient learning algorithm to integrate these two types of heterogeneous data sources to perform binary prediction tasks on node features (e.g., gene prioritization, disease gene prediction). Our method, LMGraph, consists of two steps. In the first step, we extract a small set of "network features" from the nodes of networks that represent connectivity with labeled nodes in the prediction tasks. In the second step, we apply a simple weighting scheme in conjunction with linear classifiers to combine these network features with other feature data. This two-step procedure allows us to (i) learn highly scalable and computationally efficient linear classifiers, (ii) and seamlessly combine feature-based data sources with networks. Our method is much faster than label propagation which is already known to be computationally efficient on large-scale prediction problems. Experiments on multiple functional interaction networks from three species (mouse, y, C.elegans) with tens of thousands of nodes and hundreds of binary prediction tasks demonstrate the efficacy of our method.

摘要

标签传播方法非常适合基于网络数据的各种生物医学预测任务。然而,这些算法不能用于将基于特征的数据源与网络进行整合。我们提出了一种高效的学习算法,将这两种类型的异构数据源进行整合,以便对节点特征执行二元预测任务(例如,基因优先级排序、疾病基因预测)。我们的方法LMGraph由两个步骤组成。第一步,我们从网络节点中提取一小部分“网络特征”,这些特征表示在预测任务中与标记节点的连通性。第二步,我们应用一个简单的加权方案并结合线性分类器,将这些网络特征与其他特征数据相结合。这个两步过程使我们能够:(i)学习高度可扩展且计算高效的线性分类器;(ii)无缝地将基于特征的数据源与网络相结合。我们的方法比标签传播快得多,而标签传播在大规模预测问题上已经被认为计算效率很高。在来自三个物种(小鼠、酵母、秀丽隐杆线虫)的具有数万个节点和数百个二元预测任务的多个功能相互作用网络上进行的实验证明了我们方法的有效性。

相似文献

2
Genome-wide prediction of C. elegans genetic interactions.
Science. 2006 Mar 10;311(5766):1481-4. doi: 10.1126/science.1123287.
3
Semi-supervised multi-label collective classification ensemble for functional genomics.
BMC Genomics. 2014;15 Suppl 9(Suppl 9):S17. doi: 10.1186/1471-2164-15-S9-S17. Epub 2014 Dec 8.
4
Gene expression complex networks: synthesis, identification, and analysis.
J Comput Biol. 2011 Oct;18(10):1353-67. doi: 10.1089/cmb.2010.0118. Epub 2011 May 6.
5
Enriching regulatory networks by bootstrap learning using optimised GO-based gene similarity and gene links mined from PubMed abstracts.
Int J Comput Biol Drug Des. 2011;4(1):56-82. doi: 10.1504/IJCBDD.2011.038657. Epub 2011 Feb 17.
6
A new multiple regression approach for the construction of genetic regulatory networks.
Artif Intell Med. 2010 Feb-Mar;48(2-3):153-60. doi: 10.1016/j.artmed.2009.11.001. Epub 2009 Dec 5.
7
Inferring gene regulatory networks by integrating static and dynamic data.
Int J Med Inform. 2007 Dec;76 Suppl 3:S462-75. doi: 10.1016/j.ijmedinf.2007.07.005. Epub 2007 Sep 6.
8
A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.
BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):353. doi: 10.1186/s12859-018-2301-4.
9
Predicting node characteristics from molecular networks.
Methods Mol Biol. 2011;781:399-414. doi: 10.1007/978-1-61779-276-2_20.
10
A neural network algorithm for semi-supervised node label learning from unbalanced data.
Neural Netw. 2013 Jul;43:84-98. doi: 10.1016/j.neunet.2013.01.021. Epub 2013 Feb 6.

引用本文的文献

2
Computational algorithms to predict Gene Ontology annotations.
BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-16-S6-S4. Epub 2015 Apr 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验