评估拓扑蛋白特征对负例选择的影响。

Evaluating the impact of topological protein features on the negative examples selection.

机构信息

Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135, Italy.

出版信息

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):417. doi: 10.1186/s12859-018-2385-x.

DOI:10.1186/s12859-018-2385-x

PMID:30453879

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6245585/

Abstract

BACKGROUND

Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives.

RESULTS

The present work analyses the impact of several features on the selection of negative proteins for the Gene Ontology (GO) terms. The analysis is network-based: it exploits the fact that proteins can be naturally structured in a network, considering the pairwise relationships coming from several sources of data, such as protein-protein and genetic interactions. Overall, the proposed protein features, including local and global graph centrality measures and protein multifunctionality, can be term-aware (i.e., depending on the GO term) and term-unaware (i.e., invariant across the GO terms). We validated the informativeness of each feature utilizing a temporal holdout in three different experiments on yeast, mouse and human proteomes: (i) feature selection to detect which protein features are more helpful for the negative selection; (ii) protein function prediction to verify whether the features considered are also useful to predict GO terms; (iii) negative selection by applying two different negative selection algorithms on proteins represented through the proposed features.

CONCLUSIONS

Term-aware features (with some exceptions) resulted more informative for problem (i), together with node betweenness, which is the most relevant among term-unaware features. The node positive neighborhood instead is the most predictive feature for the AFP problem, while experiment (iii) showed that the proposed features allow negative selection algorithms to select effectively negative instances in the temporal holdout setting, with better results when nonlinear combinations of features are also exploited.

摘要

背景

当应用于自动化蛋白质功能预测 (AFP) 问题的监督机器学习方法需要同时提供阳性示例（即已知具有特定蛋白质功能的蛋白质）和阴性示例（对应于不具有该功能的蛋白质）。不幸的是，公开的蛋白质组和基因组数据源（如基因本体论）很少存储蛋白质不具有的功能。因此，阴性选择，即确定有意义的阴性示例，目前是 AFP 的一个核心和具有挑战性的问题。多年来已经提出了几种启发式方法来解决这个问题；然而，尽管它们有效，但据我们所知，以前没有研究过哪些蛋白质特征与这个任务更相关，也就是说，哪些蛋白质特征更有助于区分可靠和不可靠的阴性示例。

结果

本研究分析了几种特征对基因本体论 (GO) 术语中阴性蛋白质选择的影响。分析是基于网络的：它利用了蛋白质可以自然地在网络中结构化的事实，考虑了来自多个数据源的蛋白质之间的成对关系，例如蛋白质-蛋白质和遗传相互作用。总体而言，所提出的蛋白质特征，包括局部和全局图中心性度量和蛋白质多功能性，既可以是术语感知的（即依赖于 GO 术语），也可以是术语不可知的（即跨越 GO 术语不变）。我们利用酵母、小鼠和人类蛋白质组学中的三个不同实验的时间保留来验证每个特征的信息量：（i）特征选择，以检测哪些蛋白质特征对阴性选择更有帮助；（ii）蛋白质功能预测，以验证所考虑的特征是否也有助于预测 GO 术语；（iii）通过应用两种不同的基于所提出特征的蛋白质负选择算法来进行负选择。

结论

术语感知特征（除了一些例外）对于问题 (i) 更具信息量，与节点介数一起，是术语不可知特征中最相关的。节点正邻居是 AFP 问题中最具预测性的特征，而实验 (iii) 表明，所提出的特征允许负选择算法在时间保留设置中有效地选择阴性实例，并且当还利用特征的非线性组合时，效果更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9fb/6245585/4c925ce2c6ea/12859_2018_2385_Fig1_HTML.jpg

相似文献

Evaluating the impact of topological protein features on the negative examples selection.评估拓扑蛋白特征对负例选择的影响。

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):417. doi: 10.1186/s12859-018-2385-x.

NegGOA: negative GO annotations selection using ontology structure.NegGOA：基于本体结构的负 GO 注释选择。

Bioinformatics. 2016 Oct 1;32(19):2996-3004. doi: 10.1093/bioinformatics/btw366. Epub 2016 Jun 17.

Negative example selection for protein function prediction: the NoGO database.用于蛋白质功能预测的负例选择：NoGO数据库。

PLoS Comput Biol. 2014 Jun 12;10(6):e1003644. doi: 10.1371/journal.pcbi.1003644. eCollection 2014 Jun.

Protein function prediction from protein-protein interaction network using gene ontology based neighborhood analysis and physico-chemical features.基于基因本体的邻域分析和物理化学特征，从蛋白质-蛋白质相互作用网络预测蛋白质功能。

J Bioinform Comput Biol. 2018 Dec;16(6):1850025. doi: 10.1142/S0219720018500257. Epub 2018 Sep 19.

Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.利用有监督大数据分类器在相关酵母蛋白质组中检测直系同源物时，对无比对特征进行普查。

BMC Bioinformatics. 2018 May 3;19(1):166. doi: 10.1186/s12859-018-2148-8.

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.一种基于 GPU 的算法，用于快速学习大型不平衡生物分子网络中的节点标签。

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):353. doi: 10.1186/s12859-018-2301-4.

UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions.UNIPred：蛋白质功能的不平衡感知网络整合与预测

J Comput Biol. 2015 Dec;22(12):1057-74. doi: 10.1089/cmb.2014.0110. Epub 2015 Sep 24.

A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network.一种通过组合功能基因网络的功能和拓扑性质来预测合成遗传相互作用的半监督学习方法。

BMC Bioinformatics. 2010 Jun 24;11:343. doi: 10.1186/1471-2105-11-343.

Integration of relational and hierarchical network information for protein function prediction.整合关系型和层次型网络信息用于蛋白质功能预测。

BMC Bioinformatics. 2008 Aug 22;9:350. doi: 10.1186/1471-2105-9-350.

AVID: an integrative framework for discovering functional relationships among proteins.AVID：一个用于发现蛋白质间功能关系的综合框架。

BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.

引用本文的文献

Correction to: Evaluating the impact of topological protein features on the negative examples selection.对《评估拓扑蛋白质特征对负例选择的影响》的修正

BMC Bioinformatics. 2018 Dec 17;19(1):530. doi: 10.1186/s12859-018-2545-z.

本文引用的文献

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.对蛋白质功能预测方法的扩展评估显示准确性有所提高。

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions.UNIPred：蛋白质功能的不平衡感知网络整合与预测

J Comput Biol. 2015 Dec;22(12):1057-74. doi: 10.1089/cmb.2014.0110. Epub 2015 Sep 24.

STRING v10: protein-protein interaction networks, integrated over the tree of life.STRING v10：整合了整个生命之树的蛋白质-蛋白质相互作用网络。

Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52. doi: 10.1093/nar/gku1003. Epub 2014 Oct 28.

Scale-space measures for graph topology link protein network architecture to function.尺度空间测度将图谱拓扑联系到蛋白质网络架构与功能上。

Bioinformatics. 2014 Jun 15;30(12):i237-45. doi: 10.1093/bioinformatics/btu283.

Negative example selection for protein function prediction: the NoGO database.用于蛋白质功能预测的负例选择：NoGO数据库。

PLoS Comput Biol. 2014 Jun 12;10(6):e1003644. doi: 10.1371/journal.pcbi.1003644. eCollection 2014 Jun.

Parametric Bayesian priors and better choice of negative examples improve protein function prediction.参数贝叶斯先验和更好的负例选择可以提高蛋白质功能预测。

Bioinformatics. 2013 May 1;29(9):1190-8. doi: 10.1093/bioinformatics/btt110. Epub 2013 Mar 19.

A large-scale evaluation of computational protein function prediction.大规模计算蛋白质功能预测评估。

Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

Labeling nodes using three degrees of propagation.使用三度传播对节点进行标记。

PLoS One. 2012;7(12):e51947. doi: 10.1371/journal.pone.0051947. Epub 2012 Dec 28.

The impact of multifunctional genes on "guilt by association" analysis.多功能基因对“关联定罪”分析的影响。

PLoS One. 2011 Feb 18;6(2):e17258. doi: 10.1371/journal.pone.0017258.

Fast integration of heterogeneous data sources for predicting gene function with limited annotation.快速整合异质数据源，预测具有有限注释的基因功能。

Bioinformatics. 2010 Jul 15;26(14):1759-65. doi: 10.1093/bioinformatics/btq262. Epub 2010 May 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估拓扑蛋白特征对负例选择的影响。

Evaluating the impact of topological protein features on the negative examples selection.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献