Mika Sven, Rost Burkhard
Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
PLoS Comput Biol. 2006 Jul 21;2(7):e79. doi: 10.1371/journal.pcbi.0020079. Epub 2006 May 18.
Experimental high-throughput studies of protein-protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein-protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein-protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein-protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology/
蛋白质 - 蛋白质相互作用的实验性高通量研究开始为全面的计算研究提供足够的数据。如今,大约有十个大型数据集,每个数据集都包含数千个相互作用对,对果蝇、人类、蠕虫和酵母中的相互作用进行了粗略抽样。另外,通过更细致、详细的生化实验又鉴定出了约55000对相互作用的蛋白质。大多数相互作用是在原核生物和简单真核生物中通过实验观察到的;在诸如哺乳动物等高等真核生物中观察到的相互作用很少。通常认为,哺乳动物中的信号通路可以通过与模式生物的同源性来推断,例如,观察到两种酵母蛋白相互作用,就推断人类中相应的两种蛋白也相互作用。相互作用保守的两对蛋白通常被称为互作同源物。本研究的目的是对这类推断进行大规模的全面分析,即互作同源物的进化保守性分析。在此,我们引入了一种用于衡量蛋白质 - 蛋白质相互作用数据集之间重叠程度的新分数。这种度量似乎反映了数据的整体质量,并且是我们大规模分析中两个惊人结果的基础。首先,基于同源性对物理蛋白质 - 蛋白质相互作用的推断似乎远不如预期成功。事实上,只有在序列相似性极高的情况下此类推断才准确。其次,也是最令人惊讶的是,通过序列相似性鉴定相互作用伙伴时,同一生物体内的蛋白质对要比不同物种间的蛋白质对可靠得多。我们的分析强调,即使在同一生物体上使用相同类型的实验,不同数据集之间的差异仍然很大。这一现实极大地限制了基于同源性的相互作用转移的能力。特别是,对远缘模式生物中相互作用的实验探索必须谨慎进行。蛋白质 - 蛋白质网络的更全面图像将需要多种高通量方法的结合,包括计算机模拟推断和预测。http://www.rostlab.org/results/2006/ppi_homology/