Laboratoire TIMC-IMAG, BCM, CNRS UMR5525, Faculté de médecine, La Tronche, France.
BMC Bioinformatics. 2010 Dec 21;11:605. doi: 10.1186/1471-2105-11-605.
As protein interactions mediate most cellular mechanisms, protein-protein interaction networks are essential in the study of cellular processes. Consequently, several large-scale interactome mapping projects have been undertaken, and protein-protein interactions are being distilled into databases through literature curation; yet protein-protein interaction data are still far from comprehensive, even in the model organism Saccharomyces cerevisiae. Estimating the interactome size is important for evaluating the completeness of current datasets, in order to measure the remaining efforts that are required.
We examined the yeast interactome from a new perspective, by taking into account how thoroughly proteins have been studied. We discovered that the set of literature-curated protein-protein interactions is qualitatively different when restricted to proteins that have received extensive attention from the scientific community. In particular, these interactions are less often supported by yeast two-hybrid, and more often by more complex experiments such as biochemical activity assays. Our analysis showed that high-throughput and literature-curated interactome datasets are more correlated than commonly assumed, but that this bias can be corrected for by focusing on well-studied proteins. We thus propose a simple and reliable method to estimate the size of an interactome, combining literature-curated data involving well-studied proteins with high-throughput data. It yields an estimate of at least 37, 600 direct physical protein-protein interactions in S. cerevisiae.
Our method leads to higher and more accurate estimates of the interactome size, as it accounts for interactions that are genuine yet difficult to detect with commonly-used experimental assays. This shows that we are even further from completing the yeast interactome map than previously expected.
由于蛋白质相互作用介导了大多数细胞机制,因此在研究细胞过程中,蛋白质-蛋白质相互作用网络是必不可少的。因此,已经进行了几个大规模的互作组图谱绘制项目,并且通过文献整理将蛋白质-蛋白质相互作用提炼到数据库中;然而,即使在模式生物酿酒酵母中,蛋白质-蛋白质相互作用数据仍然远远不够全面。估计互作组的大小对于评估当前数据集的完整性非常重要,以便衡量仍需要付出的努力。
我们从一个新的角度研究了酵母互作组,考虑了蛋白质被科学界研究的彻底程度。我们发现,当限制在受到科学界广泛关注的蛋白质时,文献整理的蛋白质-蛋白质相互作用集在质量上是不同的。特别是,这些相互作用不太受酵母双杂交的支持,而更多地受生化活性测定等更复杂的实验的支持。我们的分析表明,高通量和文献整理的互作组数据集比通常假设的更为相关,但通过关注研究充分的蛋白质可以纠正这种偏差。因此,我们提出了一种简单而可靠的方法来估计互作组的大小,即将涉及研究充分的蛋白质的文献整理数据与高通量数据相结合。它产生了酿酒酵母中至少 37600 个直接物理蛋白质-蛋白质相互作用的估计值。
我们的方法导致了更高和更准确的互作组大小估计,因为它考虑了那些虽然真实但用常用实验方法难以检测到的相互作用。这表明,我们离完成酵母互作组图谱的目标比之前预期的还要远。