Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command, Ft. Detrick, MD 21702, USA.
Mol Cell Proteomics. 2011 Dec;10(12):M111.012500. doi: 10.1074/mcp.M111.012500. Epub 2011 Aug 29.
We characterized and evaluated the functional attributes of three yeast high-confidence protein-protein interaction data sets derived from affinity purification/mass spectrometry, protein-fragment complementation assay, and yeast two-hybrid experiments. The interacting proteins retrieved from these data sets formed distinct, partially overlapping sets with different protein-protein interaction characteristics. These differences were primarily a function of the deployed experimental technologies used to recover these interactions. This affected the total coverage of interactions and was especially evident in the recovery of interactions among different functional classes of proteins. We found that the interaction data obtained by the yeast two-hybrid method was the least biased toward any particular functional characterization. In contrast, interacting proteins in the affinity purification/mass spectrometry and protein-fragment complementation assay data sets were over- and under-represented among distinct and different functional categories. We delineated how these differences affected protein complex organization in the network of interactions, in particular for strongly interacting complexes (e.g. RNA and protein synthesis) versus weak and transient interacting complexes (e.g. protein transport). We quantified methodological differences in detecting protein interactions from larger protein complexes, in the correlation of protein abundance among interacting proteins, and in their connectivity of essential proteins. In the latter case, we showed that minimizing inherent methodology biases removed many of the ambiguous conclusions about protein essentiality and protein connectivity. We used these findings to rationalize how biological insights obtained by analyzing data sets originating from different sources sometimes do not agree or may even contradict each other. An important corollary of this work was that discrepancies in biological insights did not necessarily imply that one detection methodology was better or worse, but rather that, to a large extent, the insights reflected the methodological biases themselves. Consequently, interpreting the protein interaction data within their experimental or cellular context provided the best avenue for overcoming biases and inferring biological knowledge.
我们对三个来源于酵母亲和纯化/质谱分析、蛋白质片段互补实验和酵母双杂交实验的高可信度蛋白互作数据进行了特征分析和功能评估。从这些数据集中检索到的互作蛋白形成了不同的、部分重叠的集合,具有不同的蛋白互作特征。这些差异主要是由于用于回收这些互作的实验技术的不同。这影响了互作的总覆盖度,在不同功能类别蛋白之间的互作回收中尤其明显。我们发现,酵母双杂交方法获得的互作数据对任何特定的功能特征的偏向性最小。相比之下,亲和纯化/质谱分析和蛋白质片段互补实验数据集中的互作蛋白在不同和不同的功能类别中存在过度和不足的表现。我们描绘了这些差异如何影响互作网络中蛋白复合物的组织,特别是对于强互作复合物(如 RNA 和蛋白质合成)与弱且短暂的互作复合物(如蛋白质运输)。我们量化了在检测较大蛋白复合物中的蛋白互作、在互作蛋白间的蛋白丰度相关性以及在必需蛋白的连通性方面,不同方法学的差异。在后一种情况下,我们表明消除了固有方法学偏倚,许多关于蛋白必需性和蛋白连通性的模糊结论就会消失。我们利用这些发现来解释为什么来自不同来源的数据集分析获得的生物学见解有时不一致,甚至可能相互矛盾。这项工作的一个重要推论是,生物学见解的差异不一定意味着一种检测方法更好或更差,而是在很大程度上,这些见解反映了方法学本身的偏见。因此,在实验或细胞背景下解释蛋白互作数据提供了克服偏见和推断生物学知识的最佳途径。