Suppr超能文献

PANNZER:在易出错环境中对未表征蛋白质进行高通量功能注释。

PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment.

作者信息

Koskinen Patrik, Törönen Petri, Nokso-Koivisto Jussi, Holm Liisa

机构信息

Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland.

Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland.

出版信息

Bioinformatics. 2015 May 15;31(10):1544-52. doi: 10.1093/bioinformatics/btu851. Epub 2015 Jan 8.

Abstract

MOTIVATION

The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation.

RESULTS

Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge.

摘要

动机

在过去十年中,蛋白质数据库显著增长。这种增长是有代价的:提交的蛋白质序列中缺乏功能注释的数量越来越多。提交到最全面的蛋白质数据库UniProtKB的序列中,约32%被标记为“未知蛋白质”或类似名称。此外,据报道,功能注释部分也包含30%-40%的错误。在此,我们推出了一种名为蛋白质Z分数注释(PANNZER)的高通量工具,用于更可靠的功能注释。PANNZER预测基因本体(GO)类别和有关蛋白质功能的自由文本描述。PANNZER使用加权k近邻方法和统计测试,以最大限度地提高功能注释的可靠性。

结果

我们在自由文本描述行预测中的结果表明,我们以明显优势超过了所有竞争方法。在GO预测中,我们相对于在2011年CAFA挑战赛中表现良好的旧方法有了明显改进。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验