Suppr超能文献

2014年的PFP和ESG蛋白质功能预测方法:数据库更新和集成方法的影响。

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches.

作者信息

Khan Ishita K, Wei Qing, Chapman Samuel, Kc Dukka B, Kihara Daisuke

机构信息

Department of Computer Sciences, Purdue University, West Lafayette, IN 47907 USA.

Department of Computational Science and Engineering, North Carolina A & T State University, Greensboro, NC 27411 USA.

出版信息

Gigascience. 2015 Sep 14;4:43. doi: 10.1186/s13742-015-0083-4. eCollection 2015.

Abstract

BACKGROUND

Functional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013-2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets.

RESULTS

For CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed.

CONCLUSIONS

Updating the annotation database was successful, improving the Fmax prediction accuracy score for both PFP and ESG. Adding the prior distribution of GO terms did not make much improvement. Both of the ensemble methods we developed improved the average Fmax score over all individual component methods except for ESG. Our benchmark results will not only complement the overall assessment that will be done by the CAFA organizers, but also help elucidate the predictive powers of sequence-based function prediction methods in general.

摘要

背景

新蛋白质的功能注释是生物信息学的核心问题之一。随着基因组测序技术的不断发展,越来越多的序列信息可供分析和注释。为了实现快速自动的功能注释,人们开发了许多计算(自动化)功能预测(AFP)方法。为了大规模客观评估这些方法的性能,已开展了全社区范围的评估实验。功能注释关键评估(CAFA)实验的第二轮于2013 - 2014年举行。2014年在波士顿举行的分子生物学智能系统(ISMB)会议的一个特别兴趣小组会议上报告了对参与团队的评估情况。我们团队使用多种内部AFP方法参与了CAFA1和CAFA2。在此,我们报告在为CAFA2目标提交功能预测之前,在准备CAFA2的过程中我们的方法所获得的基准结果。

结果

对于CAFA2,我们更新了我们的方法(蛋白质功能预测(PFP)和扩展相似性组(ESG))所使用的注释数据库,并使用原始(旧的)和更新后的数据库对其功能预测性能进行基准测试。讨论了不同设置下PFP和ESG的性能评估。我们还开发了两种集成方法,将来自六种独立的基于序列的AFP方法的功能预测进行组合。我们通过用基因本体(GO)术语的先验分布丰富预测结果,进一步分析了我们预测方法的性能。讨论了集成方法的预测示例。

结论

注释数据库更新成功,提高了PFP和ESG的Fmax预测准确度得分。添加GO术语的先验分布没有带来太大改进。我们开发的两种集成方法都提高了所有单个组件方法(ESG除外)的平均Fmax得分。我们的基准结果不仅将补充CAFA组织者将进行的整体评估,还将有助于总体阐明基于序列的功能预测方法的预测能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5152/4570625/25f70e12a03c/13742_2015_83_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验