整合蛋白质-蛋白质相互作用和文本挖掘进行蛋白质功能预测。

Integrating protein-protein interactions and text mining for protein function prediction.

机构信息

Knowledge Management in Bioinformatics, Humboldt-University Berlin, Unter den Linden 6, 10099 Berlin, Germany.

出版信息

BMC Bioinformatics. 2008 Jul 22;9 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-9-S8-S2.

DOI:10.1186/1471-2105-9-S8-S2

PMID:18673526

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2500093/

Abstract

BACKGROUND

Functional annotation of proteins remains a challenging task. Currently the scientific literature serves as the main source for yet uncurated functional annotations, but curation work is slow and expensive. Automatic techniques that support this work are still lacking reliability. We developed a method to identify conserved protein interaction graphs and to predict missing protein functions from orthologs in these graphs. To enhance the precision of the results, we furthermore implemented a procedure that validates all predictions based on findings reported in the literature.

RESULTS

Using this procedure, more than 80% of the GO annotations for proteins with highly conserved orthologs that are available in UniProtKb/Swiss-Prot could be verified automatically. For a subset of proteins we predicted new GO annotations that were not available in UniProtKb/Swiss-Prot. All predictions were correct (100% precision) according to the verifications from a trained curator.

CONCLUSION

Our method of integrating CCSs and literature mining is thus a highly reliable approach to predict GO annotations for weakly characterized proteins with orthologs.

摘要

背景

蛋白质的功能注释仍然是一项具有挑战性的任务。目前，科学文献是未经注释的功能的主要来源，但注释工作既缓慢又昂贵。支持这项工作的自动技术仍然缺乏可靠性。我们开发了一种方法来识别保守的蛋白质相互作用图，并从这些图中的直系同源物预测缺失的蛋白质功能。为了提高结果的准确性，我们还实施了一种程序，该程序基于文献中报告的发现来验证所有预测。

结果

使用此过程，可以自动验证在 UniProtKb/Swiss-Prot 中可用的具有高度保守直系同源物的蛋白质的 GO 注释的 80%以上。对于蛋白质的子集，我们预测了在 UniProtKb/Swiss-Prot 中不可用的新 GO 注释。根据经过训练的注释员的验证，所有预测均正确（100%的精度）。

结论

因此，我们将 CCS 与文献挖掘相结合的方法是一种高度可靠的方法，可以预测具有直系同源物的弱表征蛋白质的 GO 注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f1c/2500093/0c9af1da46fb/1471-2105-9-S8-S2-1.jpg

相似文献

Integrating protein-protein interactions and text mining for protein function prediction.

BMC Bioinformatics. 2008 Jul 22;9 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-9-S8-S2.

Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.

BMC Bioinformatics. 2007 Jul 10;8:243. doi: 10.1186/1471-2105-8-243.

GOAnnotator: linking protein GO annotations to evidence text.

J Biomed Discov Collab. 2006 Dec 20;1:19. doi: 10.1186/1747-5333-1-19.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

Evaluation of BioCreAtIvE assessment of task 2.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.

Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

BMC Bioinformatics. 2009 Aug 27;10 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-10-S8-S4.

Overview of the gene ontology task at BioCreative IV.

Database (Oxford). 2014 Aug 25;2014. doi: 10.1093/database/bau086. Print 2014.

Automatically extracting functionally equivalent proteins from SwissProt.

BMC Bioinformatics. 2008 Oct 6;9:418. doi: 10.1186/1471-2105-9-418.

Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data.

Database (Oxford). 2014 Mar 12;2014:bau016. doi: 10.1093/database/bau016. Print 2014.

Bioinformatics analysis of correlation between protein function and intrinsic disorder.

Int J Biol Macromol. 2021 Jan 15;167:446-456. doi: 10.1016/j.ijbiomac.2020.11.211. Epub 2020 Dec 2.

引用本文的文献

The influence of prompt engineering on large language models for protein-protein interaction identification in biomedical literature.

Sci Rep. 2025 May 3;15(1):15493. doi: 10.1038/s41598-025-99290-4.

GO2Sum: generating human-readable functional summary of proteins from GO terms.

NPJ Syst Biol Appl. 2024 Mar 15;10(1):29. doi: 10.1038/s41540-024-00358-0.

Computational models for prediction of protein-protein interaction in rice and .

Front Plant Sci. 2023 Feb 1;13:1046209. doi: 10.3389/fpls.2022.1046209. eCollection 2022.

PANDA2: protein function prediction using graph neural networks.

NAR Genom Bioinform. 2022 Feb 2;4(1):lqac004. doi: 10.1093/nargab/lqac004. eCollection 2022 Mar.

A thorough analysis of the contribution of experimental, derived and sequence-based predicted protein-protein interactions for functional annotation of proteins.

PLoS One. 2020 Nov 25;15(11):e0242723. doi: 10.1371/journal.pone.0242723. eCollection 2020.

Constructing Genetic Networks using Biomedical Literature and Rare Event Classification.

Sci Rep. 2017 Nov 17;7(1):15784. doi: 10.1038/s41598-017-16081-2.

Integrated web visualizations for protein-protein interaction databases.

BMC Bioinformatics. 2015 Jun 16;16(1):195. doi: 10.1186/s12859-015-0615-z.

Computational prediction of the human-microbial oral interactome.

BMC Syst Biol. 2014 Feb 27;8:24. doi: 10.1186/1752-0509-8-24.

Computational Prediction of Protein-Protein Interaction Networks: Algo-rithms and Resources.

Curr Genomics. 2013 Sep;14(6):397-414. doi: 10.2174/1389202911314060004.

PCorral--interactive mining of protein interactions from MEDLINE.

Database (Oxford). 2013 May 2;2013:bat030. doi: 10.1093/database/bat030. Print 2013.

本文引用的文献

Combining evidence, specificity, and proximity towards the normalization of Gene Ontology terms in text.

EURASIP J Bioinform Syst Biol. 2008;2008(1):342746. doi: 10.1155/2008/342746.

Text processing through Web services: calling Whatizit.

Bioinformatics. 2008 Jan 15;24(2):296-8. doi: 10.1093/bioinformatics/btm557. Epub 2007 Nov 15.

Publish and perish. Hedging and fraud in scientific discourse.

EMBO Rep. 2007 May;8(5):424-8. doi: 10.1038/sj.embor.7400964.

SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data.

Bioinformatics. 2007 Jun 1;23(11):1410-7. doi: 10.1093/bioinformatics/btm115. Epub 2007 Mar 28.

Network-based prediction of protein function.

Mol Syst Biol. 2007;3:88. doi: 10.1038/msb4100129. Epub 2007 Mar 13.

EBIMed--text crunching to gather facts for proteins from Medline.

Bioinformatics. 2007 Jan 15;23(2):e237-44. doi: 10.1093/bioinformatics/btl302.

GOAnnotator: linking protein GO annotations to evidence text.

J Biomed Discov Collab. 2006 Dec 20;1:19. doi: 10.1186/1747-5333-1-19.

Combination of text-mining algorithms increases the performance.

Bioinformatics. 2006 Sep 1;22(17):2151-7. doi: 10.1093/bioinformatics/btl281. Epub 2006 Jun 9.

Automatic pathway building in biological association networks.

BMC Bioinformatics. 2006 Mar 24;7:171. doi: 10.1186/1471-2105-7-171.

The Gene Ontology (GO) project in 2006.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D322-6. doi: 10.1093/nar/gkj021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

整合蛋白质-蛋白质相互作用和文本挖掘进行蛋白质功能预测。

Integrating protein-protein interactions and text mining for protein function prediction.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献