参考基因组选择对全基因组蛋白质相互作用预测计算方法性能的影响。

Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

机构信息

Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India.

出版信息

PLoS One. 2012;7(7):e42057. doi: 10.1371/journal.pone.0042057. Epub 2012 Jul 26.

DOI:10.1371/journal.pone.0042057

PMID:22844541

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3406042/

Abstract

BACKGROUND

Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions.

METHODS

We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods.

CONCLUSIONS

Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.

摘要

背景

最近在预测物理和功能蛋白质-蛋白质相互作用的计算方法方面取得的进展，为生物过程的复杂性提供了新的见解。这些方法大多假设功能相互作用的蛋白质可能具有共同的进化历史。通过将同源蛋白在多个参考基因组中的不同进化方面进行关联，可以为查询基因组中的蛋白质对追踪这种历史。这些方法包括系统发生轮廓分析、基因邻居和同一簇或操纵子中同源蛋白质编码基因的共发生。这些方法统称为基因组背景方法。另一方面，一种称为mirrortree 的方法基于两个相互作用的蛋白质之间系统发生树的相似性。这些方法的综合性能分析经常在文献中报道。然而，很少有研究深入了解参考基因组选择对检测有意义的蛋白质相互作用的影响。

方法

我们分析了四种方法及其变体的性能，以了解参考基因组选择对预测功效的影响。我们使用了六组参考基因组，按照从 565 种细菌中选择的系统发生多样性和生物体之间的关系进行采样。我们使用大肠杆菌作为模型生物，并使用 DIP、EcoCyc 和 KEGG 数据库中报告的相互作用蛋白质的黄金标准数据集来比较预测方法的性能。

结论

即使从 565 个基因组中选择 100-150 个细菌基因组，也可以实现更高的蛋白质-蛋白质相互作用预测性能。在参考基因组集中包含古细菌基因组可以提高性能。我们发现，为了获得良好的性能，最好从大量可用基因组中选择少数相关原核生物属的基因组进行采样。此外，这种采样允许在计算资源有限的情况下，选择 50-100 个基因组以获得可比的预测准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88fa/3406042/596d6386f4e0/pone.0042057.g001.jpg

相似文献

Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.参考基因组选择对全基因组蛋白质相互作用预测计算方法性能的影响。

PLoS One. 2012;7(7):e42057. doi: 10.1371/journal.pone.0042057. Epub 2012 Jul 26.

InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes.InPrePPI：一种基于基因组背景的综合评估方法，用于预测原核生物基因组中的蛋白质-蛋白质相互作用。

BMC Bioinformatics. 2007 Oct 26;8:414. doi: 10.1186/1471-2105-8-414.

Phylogenetic profiling in eukaryotes: The effect of species, orthologous group, and interactome selection on protein interaction prediction.真核生物的系统发生分析：物种、直系同源群和相互作用组选择对蛋白质相互作用预测的影响。

PLoS One. 2022 Apr 14;17(4):e0251833. doi: 10.1371/journal.pone.0251833. eCollection 2022.

Evaluation of physical and functional protein-protein interaction prediction methods for detecting biological pathways.评估物理和功能蛋白质-蛋白质相互作用预测方法，以检测生物途径。

PLoS One. 2013;8(1):e54325. doi: 10.1371/journal.pone.0054325. Epub 2013 Jan 17.

Inferring modules of functionally interacting proteins using the Bond Energy Algorithm.使用键能算法推断功能相互作用蛋白质的模块。

BMC Bioinformatics. 2008 Jun 17;9:285. doi: 10.1186/1471-2105-9-285.

Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment.利用系统发育谱比较发现功能联系和未表征的细胞途径：一项综合评估。

BMC Bioinformatics. 2007 May 23;8:173. doi: 10.1186/1471-2105-8-173.

Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling.通过系统发育谱分析影响蛋白质-蛋白质相互作用网络预测的因素研究。

BMC Genomics. 2007 Oct 29;8:393. doi: 10.1186/1471-2164-8-393.

Phylogenetic profiles for the prediction of protein-protein interactions: how to select reference organisms?用于预测蛋白质-蛋白质相互作用的系统发育谱：如何选择参考生物体？

Biochem Biophys Res Commun. 2007 Feb 23;353(4):985-91. doi: 10.1016/j.bbrc.2006.12.146. Epub 2006 Dec 27.

Comparative assessment of performance and genome dependence among phylogenetic profiling methods.系统发育谱分析方法之间性能和基因组依赖性的比较评估。

BMC Bioinformatics. 2006 Sep 27;7:420. doi: 10.1186/1471-2105-7-420.

An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.一种用于原核生物基因组中顺式调控基序识别的综合且适用的系统发育足迹分析框架。

BMC Genomics. 2016 Aug 9;17:578. doi: 10.1186/s12864-016-2982-x.

引用本文的文献

Assembling bacterial puzzles: piecing together functions into microbial pathways.组装细菌谜题：将功能拼凑成微生物途径。

NAR Genom Bioinform. 2024 Aug 24;6(3):lqae109. doi: 10.1093/nargab/lqae109. eCollection 2024 Sep.

Search, Retrieve, Visualize, and Analyze Protein-Protein Interactions from Multiple Databases: A Guide for Experimental Biologists.从多个数据库中搜索、检索、可视化和分析蛋白质-蛋白质相互作用：实验生物学家指南。

Methods Mol Biol. 2023;2690:429-443. doi: 10.1007/978-1-0716-3327-4_33.

The EcoCyc Database (2023).EcoCyc数据库（2023年）。

EcoSal Plus. 2023 Dec 12;11(1):eesp00022023. doi: 10.1128/ecosalplus.esp-0002-2023. Epub 2023 May 11.

PLoS One. 2022 Apr 14;17(4):e0251833. doi: 10.1371/journal.pone.0251833. eCollection 2022.

Protein Interaction Network Reconstruction Through Ensemble Deep Learning With Attention Mechanism.基于注意力机制的集成深度学习蛋白质相互作用网络重建

Front Bioeng Biotechnol. 2020 May 5;8:390. doi: 10.3389/fbioe.2020.00390. eCollection 2020.

PDZ Domains Across the Microbial World: Molecular Link to the Proteases, Stress Response, and Protein Synthesis.PDZ 结构域在微生物世界中的作用：连接蛋白酶、应激反应和蛋白质合成的分子。

Genome Biol Evol. 2019 Mar 1;11(3):644-659. doi: 10.1093/gbe/evz023.

The EcoCyc Database.EcoCyc数据库。

EcoSal Plus. 2018 Nov;8(1). doi: 10.1128/ecosalplus.ESP-0006-2018.

The evolutionary signal in metagenome phyletic profiles predicts many gene functions.元基因组系统发育分布中的进化信号可预测多种基因功能。

Microbiome. 2018 Jul 10;6(1):129. doi: 10.1186/s40168-018-0506-4.

The EcoCyc Database.生态循环数据库。

EcoSal Plus. 2014 May;6(1). doi: 10.1128/ecosalplus.ESP-0009-2013.

Genome-wide prediction of prokaryotic two-component system networks using a sequence-based meta-predictor.使用基于序列的元预测器对原核生物双组分系统网络进行全基因组预测。

BMC Bioinformatics. 2015 Sep 18;16:297. doi: 10.1186/s12859-015-0741-7.

本文引用的文献

Selection of organisms for the co-evolution-based study of protein interactions.用于基于共进化的蛋白质相互作用研究的生物体选择。

BMC Bioinformatics. 2011 Sep 12;12:363. doi: 10.1186/1471-2105-12-363.

A systematic study of genome context methods: calibration, normalization and combination.基因组背景方法的系统研究：校准、归一化和组合。

BMC Bioinformatics. 2010 Oct 1;11:493. doi: 10.1186/1471-2105-11-493.

Network-based function prediction and interactomics: the case for metabolic enzymes.基于网络的功能预测和互作组学：以代谢酶为例。

Metab Eng. 2011 Jan;13(1):1-10. doi: 10.1016/j.ymben.2010.07.001. Epub 2010 Jul 21.

A decade of systems biology.系统生物学的十年。

Annu Rev Cell Dev Biol. 2010;26:721-44. doi: 10.1146/annurev-cellbio-100109-104122.

Comparison of phylogenetic trees through alignment of embedded evolutionary distances.通过嵌入进化距离的比对来比较系统发育树。

BMC Bioinformatics. 2009 Dec 15;10:423. doi: 10.1186/1471-2105-10-423.

Evolution of biomolecular networks: lessons from metabolic and protein interactions.生物分子网络的演化：来自代谢和蛋白质相互作用的经验教训。

Nat Rev Mol Cell Biol. 2009 Nov;10(11):791-803. doi: 10.1038/nrm2787.

Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins.包含此前未被鉴定蛋白质的大肠杆菌全功能图谱。

PLoS Biol. 2009 Apr 28;7(4):e96. doi: 10.1371/journal.pbio.1000096.

The Ribosomal Database Project: improved alignments and new tools for rRNA analysis.核糖体数据库项目：改进的比对方法及用于rRNA分析的新工具。

Nucleic Acids Res. 2009 Jan;37(Database issue):D141-5. doi: 10.1093/nar/gkn879. Epub 2008 Nov 12.

EcoCyc: a comprehensive view of Escherichia coli biology.《大肠杆菌代谢数据库（EcoCyc）：大肠杆菌生物学全景》

Nucleic Acids Res. 2009 Jan;37(Database issue):D464-70. doi: 10.1093/nar/gkn751. Epub 2008 Oct 30.

Protein co-evolution, co-adaptation and interactions.蛋白质共同进化、共同适应及相互作用。

EMBO J. 2008 Oct 22;27(20):2648-55. doi: 10.1038/emboj.2008.189. Epub 2008 Sep 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

参考基因组选择对全基因组蛋白质相互作用预测计算方法性能的影响。

Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

机构信息

出版信息

BACKGROUND

METHODS

CONCLUSIONS

背景

方法

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献