Suppr超能文献

GO 蛋白定位的一致性预测。

Consistent prediction of GO protein localization.

机构信息

Cifasis-Conicet, Santa Fe, Rosario, S2000EZP, Argentina.

Facultad Regional San Nicolás-UTN, Buenos Aires, San Nicolás, 2900LWH, Argentina.

出版信息

Sci Rep. 2018 May 17;8(1):7757. doi: 10.1038/s41598-018-26041-z.

Abstract

The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC, a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.

摘要

GO-细胞成分 (GO-CC) 本体提供了一个用于一致描述蛋白质可能作用的亚细胞区室或大分子复合物的受控词汇。当前用于蛋白质自动 GO-CC 注释的基于机器学习的方法存在单个 GO-CC 术语预测不一致的问题。在这里,我们提出了 FGGA-CC,这是一类基于层次图的分类器,用于在亚细胞区室或大分子复合物水平上对蛋白质编码基因进行一致的 GO-CC 注释。为了提高 GO-CC 预测的准确性,我们利用 GO-生物过程 (GO-BP) 注释中的蛋白质定位知识来提高 GO-CC 预测的准确性。结果,FGGA-CC 分类器是从 GO-CC 和 GO-BP 本体论中的注释数据构建的。由于基于图的设计,FGGA-CC 分类器具有完全可解释性,并且其预测结果适合专家分析。在来自五个模式生物的蛋白质注释数据上获得了有希望的结果。此外,在番茄非模式生物中具有挑战性的串联重复基因子集的注释中也取得了成功的验证结果。总体而言,这些结果表明,FGGA-CC 分类器确实可以满足高通量测序和蛋白质组学项目中普遍存在的对 GO-CC 注释的巨大需求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443a/5958134/88b68e59e2af/41598_2018_26041_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验