Suppr超能文献

蛋白质序列的大规模自动化功能预测及PTEN转录变体的实验案例研究验证

Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants.

作者信息

Rifaioglu Ahmet Sureyya, Doğan Tunca, Saraç Ömer Sinan, Ersahin Tulin, Saidi Rabie, Atalay Mehmet Volkan, Martin Maria Jesus, Cetin-Atalay Rengul

机构信息

Department of Computer Engineering, Middle East Technical University, Ankara, 06800, Turkey.

Department of Computer Engineering, İskenderun Technical University, Hatay, 31200, Turkey.

出版信息

Proteins. 2018 Feb;86(2):135-151. doi: 10.1002/prot.25416. Epub 2017 Nov 29.

Abstract

Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred covers nearly the whole functionality spectrum in Gene Ontology system and it can predict both generic and specific GO terms. UniGOPred was run on CAFA2 challenge target protein sequences and it is categorized within the top 10 best performing methods for the molecular function category. In addition, the performance of UniGOPred is higher compared to the baseline BLAST classifier in all categories of GO. UniGOPred predictions are compared with UniProtKB/TrEMBL database annotations as well. Furthermore, the proposed tool's ability to predict negatively associated GO terms that defines the functions that a protein does not possess, is discussed. UniGOPred annotations were also validated by case studies on PTEN protein variants experimentally and on CHD8 protein variants with literature. UniGOPred protein functional annotation system is available as an open access tool at http://cansyl.metu.edu.tr/UniGOPred.html.

摘要

计算能力和机器学习的最新进展使蛋白质序列及其转录本变异的功能注释成为可能。在此,我们展示了一个自动化预测系统UniGOPred,用于基因本体(GO)注释,并提供了一个针对UniProt知识库(UniProtKB)中几种生物蛋白质组的GO术语预测数据库。UniGOPred为每个蛋白质序列提供514个分子功能(MF)、2909个生物过程(BP)和438个细胞组分(CC)GO术语的功能预测。UniGOPred涵盖了基因本体系统中几乎整个功能谱,并且能够预测通用和特定的GO术语。UniGOPred在CAFA2挑战目标蛋白质序列上运行,在分子功能类别中被归类为表现最佳的前10种方法之一。此外,在GO的所有类别中,UniGOPred的性能均高于基线BLAST分类器。UniGOPred的预测结果也与UniProtKB/TrEMBL数据库注释进行了比较。此外,还讨论了该工具预测负相关GO术语的能力,这些术语定义了蛋白质不具备的功能。UniGOPred的注释还通过对PTEN蛋白变体的实验案例研究以及对CHD8蛋白变体的文献案例研究进行了验证。UniGOPred蛋白质功能注释系统可作为开放获取工具在http://cansyl.metu.edu.tr/UniGOPred.html获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验