• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GODoc:使用新型k近邻和投票算法进行高通量蛋白质功能预测。

GODoc: high-throughput protein function prediction using novel k-nearest-neighbor and voting algorithms.

作者信息

Liu Yi-Wei, Hsu Tz-Wei, Chang Che-Yu, Liao Wen-Hung, Chang Jia-Ming

机构信息

Department of Computer Science, National Chengchi University, 11605, Taipei, Taiwan.

出版信息

BMC Bioinformatics. 2020 Nov 18;21(Suppl 6):276. doi: 10.1186/s12859-020-03556-9.

DOI:10.1186/s12859-020-03556-9
PMID:33203348
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7672824/
Abstract

BACKGROUND

Biological data has grown explosively with the advance of next-generation sequencing. However, annotating protein function with wet lab experiments is time-consuming. Fortunately, computational function prediction can help wet labs formulate biological hypotheses and prioritize experiments. Gene Ontology (GO) is a framework for unifying the representation of protein function in a hierarchical tree composed of GO terms.

RESULTS

We propose GODoc, a general protein GO prediction framework based on sequence information which combines feature engineering, feature reduction, and a novel ​k​-nearest-neighbor algorithm to resolve the multiple GO prediction problem. Comprehensive evaluation on CAFA2 shows that GODoc performs better than two baseline models. In the CAFA3 competition (68 teams), GODoc ranks 10th in Cellular Component Ontology. Regarding the species-specific task, the proposed method ranks 10th and 8th in the eukaryotic Cellular Component Ontology and the prokaryotic Molecular Function Ontology, respectively. In the term-centric task, GODoc performs third and is tied for first for the biofilm formation of Pseudomonas aeruginosa and the long-term memory of Drosophila melanogaster, respectively.

CONCLUSIONS

We have developed a novel and effective strategy to incorporate a training procedure into the k-nearest neighbor algorithm (instance-based learning) which is capable of solving the Gene Ontology multiple-label prediction problem, which is especially notable given the thousands of Gene Ontology terms.

摘要

背景

随着下一代测序技术的发展,生物数据呈爆炸式增长。然而,通过湿实验室实验注释蛋白质功能耗时较长。幸运的是,计算功能预测可以帮助湿实验室形成生物学假设并对实验进行优先级排序。基因本体论(GO)是一个用于在由GO术语组成的层次树中统一蛋白质功能表示的框架。

结果

我们提出了GODoc,这是一个基于序列信息的通用蛋白质GO预测框架,它结合了特征工程、特征约简和一种新颖的k近邻算法来解决多重GO预测问题。在CAFA2上的综合评估表明,GODoc的性能优于两个基线模型。在CAFA3竞赛(68个团队)中,GODoc在细胞成分本体论中排名第10。在物种特异性任务中,所提出的方法在真核细胞成分本体论和原核分子功能本体论中分别排名第10和第8。在以术语为中心的任务中,GODoc分别在铜绿假单胞菌的生物膜形成和黑腹果蝇的长期记忆方面排名第三且并列第一。

结论

我们开发了一种新颖有效的策略,将训练过程纳入k近邻算法(基于实例的学习),该算法能够解决基因本体论多标签预测问题,鉴于有成千上万的基因本体论术语,这一点尤其显著。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/476123c84058/12859_2020_3556_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/e0306e452ee9/12859_2020_3556_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/a08ed638ac34/12859_2020_3556_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/ab6b451f1fc2/12859_2020_3556_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/53c4bee0d73a/12859_2020_3556_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/d5cd1c607ca0/12859_2020_3556_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/8bc99bf794bc/12859_2020_3556_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/6d0d24959d77/12859_2020_3556_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/b71dd52ae4ea/12859_2020_3556_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/0c6bc1367f50/12859_2020_3556_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/476123c84058/12859_2020_3556_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/e0306e452ee9/12859_2020_3556_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/a08ed638ac34/12859_2020_3556_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/ab6b451f1fc2/12859_2020_3556_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/53c4bee0d73a/12859_2020_3556_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/d5cd1c607ca0/12859_2020_3556_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/8bc99bf794bc/12859_2020_3556_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/6d0d24959d77/12859_2020_3556_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/b71dd52ae4ea/12859_2020_3556_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/0c6bc1367f50/12859_2020_3556_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/681d/7672824/476123c84058/12859_2020_3556_Fig10_HTML.jpg

相似文献

1
GODoc: high-throughput protein function prediction using novel k-nearest-neighbor and voting algorithms.GODoc:使用新型k近邻和投票算法进行高通量蛋白质功能预测。
BMC Bioinformatics. 2020 Nov 18;21(Suppl 6):276. doi: 10.1186/s12859-020-03556-9.
2
Incorporating functional inter-relationships into protein function prediction algorithms.将功能相互关系纳入蛋白质功能预测算法。
BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.
3
Exploiting MEDLINE for gene molecular function prediction via NMF based multi-label classification.利用基于 NMF 的多标签分类挖掘 MEDLINE 进行基因分子功能预测。
J Biomed Inform. 2018 Oct;86:160-166. doi: 10.1016/j.jbi.2018.08.009. Epub 2018 Aug 18.
4
Multitask Protein Function Prediction through Task Dissimilarity.通过任务差异进行多任务蛋白质功能预测。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1550-1560. doi: 10.1109/TCBB.2017.2684127. Epub 2017 Mar 17.
5
A deep neural network based hierarchical multi-label classification method.一种基于深度神经网络的层次多标签分类方法。
Rev Sci Instrum. 2020 Feb 1;91(2):024103. doi: 10.1063/1.5141161.
6
Computational algorithms to predict Gene Ontology annotations.预测基因本体注释的计算算法。
BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-16-S6-S4. Epub 2015 Apr 17.
7
A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction.一种基于基因本体术语的用于蛋白质-蛋白质相互作用预测的新特征向量
IEEE/ACM Trans Comput Biol Bioinform. 2017 Jul-Aug;14(4):762-770. doi: 10.1109/TCBB.2016.2555304. Epub 2016 Apr 20.
8
INGA 2.0: improving protein function prediction for the dark proteome.INGA 2.0:改进黑暗蛋白质组中蛋白质功能的预测。
Nucleic Acids Res. 2019 Jul 2;47(W1):W373-W378. doi: 10.1093/nar/gkz375.
9
MS-kNN: protein function prediction by integrating multiple data sources.MS-kNN:整合多数据源的蛋白质功能预测
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S8. doi: 10.1186/1471-2105-14-S3-S8. Epub 2013 Feb 28.
10
Hierarchical classification of gene ontology terms using the GOstruct method.使用GOstruct方法对基因本体术语进行层次分类。
J Bioinform Comput Biol. 2010 Apr;8(2):357-76. doi: 10.1142/s0219720010004744.

引用本文的文献

1
BloodProST: prediction of blood-secretory proteins through self-training.BloodProST:通过自我训练预测血液分泌蛋白
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf385.
2
Protein function prediction using GO similarity-based heterogeneous network propagation.基于基因本体(GO)相似性的异质网络传播进行蛋白质功能预测
Sci Rep. 2025 May 31;15(1):19131. doi: 10.1038/s41598-025-04933-1.
3
Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review.

本文引用的文献

1
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.CAFA 挑战赛报告称,通过实验筛选,提高了数百个基因的蛋白质功能预测和新的功能注释。
Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.
2
CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences.CATH-Gene3D:资源生成及其在获取蛋白质序列结构和功能注释中的应用。
Methods Mol Biol. 2017;1558:79-110. doi: 10.1007/978-1-4939-6783-4_4.
3
An expanded evaluation of protein function prediction methods shows an improvement in accuracy.
评估蛋白质语言模型在蛋白质功能预测编码策略方面的进展:全面综述。
Front Bioeng Biotechnol. 2025 Jan 21;13:1506508. doi: 10.3389/fbioe.2025.1506508. eCollection 2025.
4
PANDA-3D: protein function prediction based on AlphaFold models.PANDA-3D:基于AlphaFold模型的蛋白质功能预测
NAR Genom Bioinform. 2024 Aug 6;6(3):lqae094. doi: 10.1093/nargab/lqae094. eCollection 2024 Sep.
5
An approach to the diagnosis of lumbar disc herniation using deep learning models.一种使用深度学习模型诊断腰椎间盘突出症的方法。
Front Bioeng Biotechnol. 2023 Sep 4;11:1247112. doi: 10.3389/fbioe.2023.1247112. eCollection 2023.
对蛋白质功能预测方法的扩展评估显示准确性有所提高。
Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.
4
Functional classification of CATH superfamilies: a domain-based approach for protein function annotation.CATH 超家族的功能分类:一种基于结构域的蛋白质功能注释方法。
Bioinformatics. 2015 Nov 1;31(21):3460-7. doi: 10.1093/bioinformatics/btv398. Epub 2015 Jul 2.
5
Using CATH-Gene3D to Analyze the Sequence, Structure, and Function of Proteins.使用CATH-Gene3D分析蛋白质的序列、结构和功能。
Curr Protoc Bioinformatics. 2015 Jun 19;50:1.28.1-1.28.21. doi: 10.1002/0471250953.bi0128s50.
6
CATH FunFHMMer web server: protein functional annotations using functional family assignments.CATH FunFHMMer网络服务器:利用功能家族分配进行蛋白质功能注释。
Nucleic Acids Res. 2015 Jul 1;43(W1):W148-53. doi: 10.1093/nar/gkv488. Epub 2015 May 11.
7
Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations.基于对应分析和紧致集关系的高效可解释蛋白质功能类预测。
PLoS One. 2013 Oct 11;8(10):e75542. doi: 10.1371/journal.pone.0075542. eCollection 2013.
8
A large-scale evaluation of computational protein function prediction.大规模计算蛋白质功能预测评估。
Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.
9
Search and clustering orders of magnitude faster than BLAST.比 BLAST 快几个数量级的搜索和聚类。
Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12.
10
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis.PSLDoc:基于间隔二肽和概率潜在语义分析的蛋白质亚细胞定位预测
Proteins. 2008 Aug;72(2):693-710. doi: 10.1002/prot.21944.