蛋白质相互作用位点的全基因组推断：来自酵母高质量负蛋白质-蛋白质相互作用数据集的经验教训。

Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset.

作者信息

Guo Jie, Wu Xiaomei, Zhang Da-Yong, Lin Kui

机构信息

MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China.

出版信息

Nucleic Acids Res. 2008 Apr;36(6):2002-11. doi: 10.1093/nar/gkn016. Epub 2008 Feb 14.

DOI:10.1093/nar/gkn016

PMID:18281313

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2346601/

Abstract

High-throughput studies of protein interactions may have produced, experimentally and computationally, the most comprehensive protein-protein interaction datasets in the completely sequenced genomes. It provides us an opportunity on a proteome scale, to discover the underlying protein interaction patterns. Here, we propose an approach to discovering motif pairs at interaction sites (often 3-8 residues) that are essential for understanding protein functions and helpful for the rational design of protein engineering and folding experiments. A gold standard positive (interacting) dataset and a gold standard negative (non-interacting) dataset were mined to infer the interacting motif pairs that are significantly overrepresented in the positive dataset compared to the negative dataset. Four negative datasets assembled by different strategies were evaluated and the one with the best performance was used as the gold standard negatives for further analysis. Meanwhile, to assess the efficiency of our method in detecting potential interacting motif pairs, other approaches developed previously were compared, and we found that our method achieved the highest prediction accuracy. In addition, many uncharacterized motif pairs of interest were found to be functional with experimental evidence in other species. This investigation demonstrates the important effects of a high-quality negative dataset on the performance of such statistical inference.

摘要

蛋白质相互作用的高通量研究可能已经通过实验和计算方法，在全测序基因组中产生了最全面的蛋白质-蛋白质相互作用数据集。它为我们提供了一个在蛋白质组规模上发现潜在蛋白质相互作用模式的机会。在此，我们提出一种方法来发现相互作用位点（通常为3至8个残基）上的基序对，这些基序对对于理解蛋白质功能至关重要，并且有助于合理设计蛋白质工程和折叠实验。通过挖掘一个金标准阳性（相互作用）数据集和一个金标准阴性（非相互作用）数据集，以推断与阴性数据集相比在阳性数据集中显著富集的相互作用基序对。对通过不同策略组装的四个阴性数据集进行了评估，并将性能最佳的那个用作金标准阴性数据集进行进一步分析。同时，为了评估我们的方法在检测潜在相互作用基序对方面的效率，将其与先前开发的其他方法进行了比较，我们发现我们的方法实现了最高的预测准确性。此外，许多未表征的感兴趣基序对在其他物种中被发现具有实验证据支持的功能。这项研究证明了高质量阴性数据集对此类统计推断性能的重要影响。

相似文献

Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset.

Nucleic Acids Res. 2008 Apr;36(6):2002-11. doi: 10.1093/nar/gkn016. Epub 2008 Feb 14.

Discovering motif pairs at interaction sites from protein sequences on a proteome-wide scale.

Bioinformatics. 2006 Apr 15;22(8):989-96. doi: 10.1093/bioinformatics/btl020. Epub 2006 Jan 29.

Yeast protein-protein interaction binding sites: prediction from the motif-motif, motif-domain and domain-domain levels.

Mol Biosyst. 2010 Nov;6(11):2164-73. doi: 10.1039/c0mb00038h. Epub 2010 Aug 17.

Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations.

Nucleic Acids Res. 2006 Apr 26;34(7):2137-50. doi: 10.1093/nar/gkl219. Print 2006.

Assessing reliability of protein-protein interactions by integrative analysis of data in model organisms.

BMC Bioinformatics. 2009 Apr 29;10 Suppl 4(Suppl 4):S5. doi: 10.1186/1471-2105-10-S4-S5.

Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

PLoS One. 2012;7(7):e42057. doi: 10.1371/journal.pone.0042057. Epub 2012 Jul 26.

Negative protein-protein interaction datasets derived from large-scale two-hybrid experiments.

Methods. 2012 Dec;58(4):343-8. doi: 10.1016/j.ymeth.2012.07.028. Epub 2012 Aug 4.

A matrix based algorithm for Protein-Protein Interaction prediction using Domain-Domain Associations.

J Theor Biol. 2013 Jun 7;326:36-42. doi: 10.1016/j.jtbi.2013.02.016. Epub 2013 Mar 6.

Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair.

Genome Inform. 2002;13:42-50.

Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset.

Amino Acids. 2010 Mar;38(3):891-9. doi: 10.1007/s00726-009-0295-y. Epub 2009 Apr 24.

引用本文的文献

pathDIP 4: an extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species.

Nucleic Acids Res. 2020 Jan 8;48(D1):D479-D488. doi: 10.1093/nar/gkz989.

Binding site prediction for protein-protein interactions and novel motif discovery using re-occurring polypeptide sequences.

BMC Bioinformatics. 2011 Jun 2;12:225. doi: 10.1186/1471-2105-12-225.

Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences.

BMC Bioinformatics. 2009 Dec 14;10:419. doi: 10.1186/1471-2105-10-419.

Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.

BMC Bioinformatics. 2009 Aug 5;10:241. doi: 10.1186/1471-2105-10-241.

ModLink+: improving fold recognition by using protein-protein interactions.

Bioinformatics. 2009 Jun 15;25(12):1506-12. doi: 10.1093/bioinformatics/btp238. Epub 2009 Apr 8.

本文引用的文献

InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale.

Genome Biol. 2007;8(9):R192. doi: 10.1186/gb-2007-8-9-r192.

Predicting protein-protein interactions based only on sequences information.

Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4337-41. doi: 10.1073/pnas.0607879104. Epub 2007 Mar 5.

SPIDer: Saccharomyces protein-protein interaction database.

BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S16. doi: 10.1186/1471-2105-7-S5-S16.

How complete are current yeast and human protein-interaction networks?

Genome Biol. 2006;7(11):120. doi: 10.1186/gb-2006-7-11-120.

DOMINO: a database of domain-peptide interactions.

Nucleic Acids Res. 2007 Jan;35(Database issue):D557-60. doi: 10.1093/nar/gkl961. Epub 2006 Nov 29.

BMC Bioinformatics. 2006 Nov 16;7:502. doi: 10.1186/1471-2105-7-502.

Predicting domain-domain interactions using a parsimony approach.

Genome Biol. 2006;7(11):R104. doi: 10.1186/gb-2006-7-11-r104.

The many faces of protein-protein interactions: A compendium of interface geometry.

PLoS Comput Biol. 2006 Sep 29;2(9):e124. doi: 10.1371/journal.pcbi.0020124. Epub 2006 Jul 31.

An integrated approach to the prediction of domain-domain interactions.

BMC Bioinformatics. 2006 May 25;7:269. doi: 10.1186/1471-2105-7-269.

Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations.

Nucleic Acids Res. 2006 Apr 26;34(7):2137-50. doi: 10.1093/nar/gkl219. Print 2006.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质相互作用位点的全基因组推断：来自酵母高质量负蛋白质-蛋白质相互作用数据集的经验教训。

Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献