利用基因本体注释减少蛋白质-蛋白质相互作用预测中的假阳性

False positive reduction in protein-protein interaction predictions using gene ontology annotations.

作者信息

Mahdavi Mahmoud A, Lin Yen-Han

机构信息

Department of Chemical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.

出版信息

BMC Bioinformatics. 2007 Jul 23;8:262. doi: 10.1186/1471-2105-8-262.

DOI:10.1186/1471-2105-8-262

PMID:17645798

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1941744/

Abstract

BACKGROUND

Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated.

RESULTS

Gene Ontology (GO) annotations were used to reduce false positive protein-protein interactions (PPI) pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets) in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The 'strength', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the strength varies between two and ten-fold of randomly removing protein pairs from the datasets.

CONCLUSION

Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially remove false predicted PPI pairs. Removal of false positives from predicted datasets increases the true positive fractions of the datasets and improves the robustness of predicted pairs as compared to random protein pairing, and eventually results in better overlap with experimental results.

摘要

背景

许多关键的细胞操作，如新陈代谢、信号传导和调控，都基于蛋白质-蛋白质相互作用。然而，缺乏可靠的蛋白质-蛋白质相互作用信息是一个挑战。缺乏可靠蛋白质-蛋白质相互作用信息的一个原因是实验结果与计算集之间的一致性较差，而这又源于计算方法中大量的假阳性预测。基于高度可信的实验结果减少假阳性预测并提高计算预测的蛋白质-蛋白质相互作用数据集的真阳性率尚未得到充分研究。

结果

利用基因本体论（GO）注释来减少计算预测产生的假阳性蛋白质-蛋白质相互作用（PPI）对。以实验获得的PPI对作为训练数据集，从GO分子功能注释中提取了八个排名靠前的关键词。这些关键词在酵母实验数据集中的敏感性为64.21%，在蠕虫实验数据集中为80.83%。将这些关键词应用于每个研究生物体的四个预测PPI数据集时，其特异性（一种恢复能力的度量）在酵母和蠕虫中分别为48.32%和46.49%（四个数据集的平均值）。基于八个排名靠前的关键词和相互作用蛋白质的共定位，推导并应用了一组两条知识规则来去除假阳性蛋白质对。基于信噪比定义了规则提供的改进度量“强度”，并用于衡量知识规则对预测PPI数据集的适用性。根据所采用的PPI预测方法，强度在从数据集中随机去除蛋白质对的两到十倍之间变化。

结论

基因本体论注释以及推导的知识规则可用于部分去除错误预测的PPI对。从预测数据集中去除假阳性会增加数据集的真阳性率，并与随机蛋白质配对相比提高预测对的稳健性，最终导致与实验结果有更好的重叠。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a34/1941744/5915e0ad8a0f/1471-2105-8-262-1.jpg

相似文献

False positive reduction in protein-protein interaction predictions using gene ontology annotations.利用基因本体注释减少蛋白质-蛋白质相互作用预测中的假阳性

BMC Bioinformatics. 2007 Jul 23;8:262. doi: 10.1186/1471-2105-8-262.

Utilizing shared interacting domain patterns and Gene Ontology information to improve protein-protein interaction prediction.利用共享交互结构域模式和基因本体论信息提高蛋白质-蛋白质相互作用预测。

Comput Biol Med. 2010 Jun;40(6):555-64. doi: 10.1016/j.compbiomed.2010.03.009. Epub 2010 Apr 24.

AVID: an integrative framework for discovering functional relationships among proteins.AVID：一个用于发现蛋白质间功能关系的综合框架。

BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.

Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets.结核分枝杆菌 H37Rv 蛋白-蛋白相互作用数据集的比较分析与评估。

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S20. doi: 10.1186/1471-2164-12-S3-S20.

MCL-CAw: a refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating core-attachment structure.MCL-CAw：一种改进的 MCL 方法，用于通过整合核心附着结构，从加权 PPI 网络中检测酵母复合物。

BMC Bioinformatics. 2010 Oct 12;11:504. doi: 10.1186/1471-2105-11-504.

A matrix based algorithm for Protein-Protein Interaction prediction using Domain-Domain Associations.基于矩阵的算法，利用域-域关联预测蛋白质-蛋白质相互作用。

J Theor Biol. 2013 Jun 7;326:36-42. doi: 10.1016/j.jtbi.2013.02.016. Epub 2013 Mar 6.

CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.CvManGO，一种利用计算预测来改进基于文献的基因本体论注释的方法。

Database (Oxford). 2012 Mar 20;2012:bas001. doi: 10.1093/database/bas001. Print 2012.

A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks.一种用于蛋白质相互作用网络中功能模块发现的快速层次聚类算法。

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):607-20. doi: 10.1109/TCBB.2010.75.

Can simple codon pair usage predict protein-protein interaction?简单的密码子对使用情况能否预测蛋白质-蛋白质相互作用？

Mol Biosyst. 2012 Apr;8(5):1396-404. doi: 10.1039/c2mb05427b. Epub 2012 Mar 5.

From Function to Interaction: A New Paradigm for Accurately Predicting Protein Complexes Based on Protein-to-Protein Interaction Networks.从功能到相互作用：基于蛋白质-蛋白质相互作用网络准确预测蛋白质复合物的新范式。

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):616-27. doi: 10.1109/TCBB.2014.2306825.

引用本文的文献

Robust signalling entropy estimation for biological process characterisation.用于生物过程表征的稳健信号熵估计

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf269.

Bioinformatic discovery of type 11 secretion system (T11SS) cargo across the .通过生物信息学在……范围内发现11型分泌系统（T11SS）的货物蛋白。（你提供的原文似乎不完整，across后面缺少具体内容）

Microb Genom. 2025 May;11(5). doi: 10.1099/mgen.0.001406.

Benchmarking of protein interaction databases for integration with manually reconstructed signalling network models.蛋白质相互作用数据库的基准测试，用于与人工重建的信号网络模型集成。

J Physiol. 2024 Sep;602(18):4529-4542. doi: 10.1113/JP284616. Epub 2023 May 30.

Bioinformatic prediction of the molecular links between Alzheimer's disease and diabetes mellitus.阿尔茨海默病和糖尿病之间分子关联的生物信息学预测。

PeerJ. 2023 Feb 7;11:e14738. doi: 10.7717/peerj.14738. eCollection 2023.

A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain.生物医学领域基于知识图的相似度的基准数据集集合。

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa078.

A data-driven interactome of synergistic genes improves network-based cancer outcome prediction.基于数据驱动的协同基因互作网络提高了基于网络的癌症预后预测能力。

PLoS Comput Biol. 2019 Feb 6;15(2):e1006657. doi: 10.1371/journal.pcbi.1006657. eCollection 2019 Feb.

Identification of epistasis loci underlying rice flowering time by controlling population stratification and polygenic effect.通过控制群体分层和多基因效应鉴定水稻开花时间的上位性位点。

DNA Res. 2019 Apr 1;26(2):119-130. doi: 10.1093/dnares/dsy043.

Discovery of Novel Functional Centers With Rationally Designed Amino Acid Motifs.通过合理设计氨基酸基序发现新型功能中心

Comput Struct Biotechnol J. 2018 Feb 27;16:70-76. doi: 10.1016/j.csbj.2018.02.007. eCollection 2018.

Identification of Antifungal Targets Based on Computer Modeling.基于计算机建模的抗真菌靶点鉴定

J Fungi (Basel). 2018 Jul 4;4(3):81. doi: 10.3390/jof4030081.

A TRPV2 interactome-based signature for prognosis in glioblastoma patients.基于TRPV2相互作用组的胶质母细胞瘤患者预后特征

Oncotarget. 2018 Apr 6;9(26):18400-18409. doi: 10.18632/oncotarget.24843.

本文引用的文献

GOAnnotator: linking protein GO annotations to evidence text.基因本体注释工具：将蛋白质的基因本体注释与证据文本相链接。

J Biomed Discov Collab. 2006 Dec 20;1:19. doi: 10.1186/1747-5333-1-19.

Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations.酵母蛋白质-蛋白质相互作用网络的预测：来自基因本体论和注释的见解。

Nucleic Acids Res. 2006 Apr 26;34(7):2137-50. doi: 10.1093/nar/gkl219. Print 2006.

Computational approaches for predicting protein-protein interactions: a survey.预测蛋白质-蛋白质相互作用的计算方法：综述

J Med Syst. 2006 Feb;30(1):39-44. doi: 10.1007/s10916-006-7402-3.

Combining gene expression profiles and protein-protein interaction data to infer gene functions.结合基因表达谱和蛋白质-蛋白质相互作用数据来推断基因功能。

J Biotechnol. 2006 Jul 25;124(3):475-85. doi: 10.1016/j.jbiotec.2006.01.024. Epub 2006 Mar 13.

Oligomeric protein structure networks: insights into protein-protein interactions.寡聚体蛋白质结构网络：对蛋白质-蛋白质相互作用的见解

BMC Bioinformatics. 2005 Dec 10;6:296. doi: 10.1186/1471-2105-6-296.

Probabilistic model of the human protein-protein interaction network.人类蛋白质-蛋白质相互作用网络的概率模型

Nat Biotechnol. 2005 Aug;23(8):951-9. doi: 10.1038/nbt1103.

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.对生物创意（BioCreAtIvE）和基因本体注释（GOA）的基因本体（GO）注释检索的评估。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.

Inferring protein-protein interactions through high-throughput interaction data from diverse organisms.通过来自不同生物体的高通量相互作用数据推断蛋白质-蛋白质相互作用。

Bioinformatics. 2005 Aug 1;21(15):3279-85. doi: 10.1093/bioinformatics/bti492. Epub 2005 May 19.

Prediction of functional modules based on comparative genome analysis and Gene Ontology application.基于比较基因组分析和基因本体应用的功能模块预测

Nucleic Acids Res. 2005 May 18;33(9):2822-37. doi: 10.1093/nar/gki573. Print 2005.

Effect of training datasets on support vector machine prediction of protein-protein interactions.训练数据集对蛋白质-蛋白质相互作用支持向量机预测的影响。

Proteomics. 2005 Mar;5(4):876-84. doi: 10.1002/pmic.200401118.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用基因本体注释减少蛋白质-蛋白质相互作用预测中的假阳性

False positive reduction in protein-protein interaction predictions using gene ontology annotations.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献