Rifaioglu Ahmet Sureyya, Doğan Tunca, Saraç Ömer Sinan, Ersahin Tulin, Saidi Rabie, Atalay Mehmet Volkan, Martin Maria Jesus, Cetin-Atalay Rengul
Department of Computer Engineering, Middle East Technical University, Ankara, 06800, Turkey.
Department of Computer Engineering, İskenderun Technical University, Hatay, 31200, Turkey.
Proteins. 2018 Feb;86(2):135-151. doi: 10.1002/prot.25416. Epub 2017 Nov 29.
Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred covers nearly the whole functionality spectrum in Gene Ontology system and it can predict both generic and specific GO terms. UniGOPred was run on CAFA2 challenge target protein sequences and it is categorized within the top 10 best performing methods for the molecular function category. In addition, the performance of UniGOPred is higher compared to the baseline BLAST classifier in all categories of GO. UniGOPred predictions are compared with UniProtKB/TrEMBL database annotations as well. Furthermore, the proposed tool's ability to predict negatively associated GO terms that defines the functions that a protein does not possess, is discussed. UniGOPred annotations were also validated by case studies on PTEN protein variants experimentally and on CHD8 protein variants with literature. UniGOPred protein functional annotation system is available as an open access tool at http://cansyl.metu.edu.tr/UniGOPred.html.
计算能力和机器学习的最新进展使蛋白质序列及其转录本变异的功能注释成为可能。在此,我们展示了一个自动化预测系统UniGOPred,用于基因本体(GO)注释,并提供了一个针对UniProt知识库(UniProtKB)中几种生物蛋白质组的GO术语预测数据库。UniGOPred为每个蛋白质序列提供514个分子功能(MF)、2909个生物过程(BP)和438个细胞组分(CC)GO术语的功能预测。UniGOPred涵盖了基因本体系统中几乎整个功能谱,并且能够预测通用和特定的GO术语。UniGOPred在CAFA2挑战目标蛋白质序列上运行,在分子功能类别中被归类为表现最佳的前10种方法之一。此外,在GO的所有类别中,UniGOPred的性能均高于基线BLAST分类器。UniGOPred的预测结果也与UniProtKB/TrEMBL数据库注释进行了比较。此外,还讨论了该工具预测负相关GO术语的能力,这些术语定义了蛋白质不具备的功能。UniGOPred的注释还通过对PTEN蛋白变体的实验案例研究以及对CHD8蛋白变体的文献案例研究进行了验证。UniGOPred蛋白质功能注释系统可作为开放获取工具在http://cansyl.metu.edu.tr/UniGOPred.html获取。