利用网络上下文信息预测人类癌症相关基因。

Predicting genes involved in human cancer using network contextual information.

作者信息

Rahmani Hossein, Blockeel Hendrik, Bender Andreas

机构信息

Leiden Institute of Advanced Computer Science, Universiteit Leiden, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands.

出版信息

J Integr Bioinform. 2012 Sep 5;9(1):210. doi: 10.2390/biecoll-jib-2012-210.

Abstract

Protein-Protein Interaction (PPI) networks have been widely used for the task of predicting proteins involved in cancer. Previous research has shown that functional information about the protein for which a prediction is made, proximity to specific other proteins in the PPI network, as well as local network structure are informative features in this respect. In this work, we introduce two new types of input features, reflecting additional information: (1) Functional Context: the functions of proteins interacting with the target protein (rather than the protein itself); and (2) Structural Context: the relative position of the target protein with respect to specific other proteins selected according to a novel ANOVA (analysis of variance) based measure. We also introduce a selection strategy to pinpoint the most informative features. Results show that the proposed feature types and feature selection strategy yield informative features. A standard machine learning method (Naive Bayes) that uses the features proposed here outperforms the current state-of-the-art methods by more than 5% with respect to F-measure. In addition, manual inspection confirms the biological relevance of the top-ranked features.

摘要

蛋白质-蛋白质相互作用(PPI)网络已被广泛用于预测参与癌症的蛋白质的任务。先前的研究表明,关于进行预测的蛋白质的功能信息、在PPI网络中与特定其他蛋白质的接近程度以及局部网络结构在这方面都是有信息价值的特征。在这项工作中,我们引入了两种反映额外信息的新型输入特征:(1)功能上下文:与目标蛋白质相互作用的蛋白质的功能(而非蛋白质本身);(2)结构上下文:目标蛋白质相对于根据基于新颖的方差分析(ANOVA)的度量选择的特定其他蛋白质的相对位置。我们还引入了一种选择策略来确定最具信息价值的特征。结果表明,所提出的特征类型和特征选择策略产生了有信息价值的特征。使用此处提出的特征的标准机器学习方法(朴素贝叶斯)在F值方面比当前的最先进方法高出5%以上。此外,人工检查证实了排名靠前的特征的生物学相关性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索