Suppr超能文献

GODoc:使用新型k近邻和投票算法进行高通量蛋白质功能预测。

GODoc: high-throughput protein function prediction using novel k-nearest-neighbor and voting algorithms.

作者信息

Liu Yi-Wei, Hsu Tz-Wei, Chang Che-Yu, Liao Wen-Hung, Chang Jia-Ming

机构信息

Department of Computer Science, National Chengchi University, 11605, Taipei, Taiwan.

出版信息

BMC Bioinformatics. 2020 Nov 18;21(Suppl 6):276. doi: 10.1186/s12859-020-03556-9.

Abstract

BACKGROUND

Biological data has grown explosively with the advance of next-generation sequencing. However, annotating protein function with wet lab experiments is time-consuming. Fortunately, computational function prediction can help wet labs formulate biological hypotheses and prioritize experiments. Gene Ontology (GO) is a framework for unifying the representation of protein function in a hierarchical tree composed of GO terms.

RESULTS

We propose GODoc, a general protein GO prediction framework based on sequence information which combines feature engineering, feature reduction, and a novel ​k​-nearest-neighbor algorithm to resolve the multiple GO prediction problem. Comprehensive evaluation on CAFA2 shows that GODoc performs better than two baseline models. In the CAFA3 competition (68 teams), GODoc ranks 10th in Cellular Component Ontology. Regarding the species-specific task, the proposed method ranks 10th and 8th in the eukaryotic Cellular Component Ontology and the prokaryotic Molecular Function Ontology, respectively. In the term-centric task, GODoc performs third and is tied for first for the biofilm formation of Pseudomonas aeruginosa and the long-term memory of Drosophila melanogaster, respectively.

CONCLUSIONS

We have developed a novel and effective strategy to incorporate a training procedure into the k-nearest neighbor algorithm (instance-based learning) which is capable of solving the Gene Ontology multiple-label prediction problem, which is especially notable given the thousands of Gene Ontology terms.

摘要

背景

随着下一代测序技术的发展,生物数据呈爆炸式增长。然而,通过湿实验室实验注释蛋白质功能耗时较长。幸运的是,计算功能预测可以帮助湿实验室形成生物学假设并对实验进行优先级排序。基因本体论(GO)是一个用于在由GO术语组成的层次树中统一蛋白质功能表示的框架。

结果

我们提出了GODoc,这是一个基于序列信息的通用蛋白质GO预测框架,它结合了特征工程、特征约简和一种新颖的k近邻算法来解决多重GO预测问题。在CAFA2上的综合评估表明,GODoc的性能优于两个基线模型。在CAFA3竞赛(68个团队)中,GODoc在细胞成分本体论中排名第10。在物种特异性任务中,所提出的方法在真核细胞成分本体论和原核分子功能本体论中分别排名第10和第8。在以术语为中心的任务中,GODoc分别在铜绿假单胞菌的生物膜形成和黑腹果蝇的长期记忆方面排名第三且并列第一。

结论

我们开发了一种新颖有效的策略,将训练过程纳入k近邻算法(基于实例的学习),该算法能够解决基因本体论多标签预测问题,鉴于有成千上万的基因本体论术语,这一点尤其显著。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/e0306e452ee9/12859_2020_3556_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验