Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, UK.
Mol Cell Proteomics. 2013 Jan;12(1):1-13. doi: 10.1074/mcp.R112.019554. Epub 2012 Oct 15.
Advances in sensitivity, resolution, mass accuracy, and throughput have considerably increased the number of protein identifications made via mass spectrometry. Despite these advances, state-of-the-art experimental methods for the study of protein-protein interactions yield more candidate interactions than may be expected biologically owing to biases and limitations in the experimental methodology. In silico methods, which distinguish between true and false interactions, have been developed and applied successfully to reduce the number of false positive results yielded by physical interaction assays. Such methods may be grouped according to: (1) the type of data used: methods based on experiment-specific measurements (e.g., spectral counts or identification scores) versus methods that extract knowledge encoded in external annotations (e.g., public interaction and functional categorisation databases); (2) the type of algorithm applied: the statistical description and estimation of physical protein properties versus predictive supervised machine learning or text-mining algorithms; (3) the type of protein relation evaluated: direct (binary) interaction of two proteins in a cocomplex versus probability of any functional relationship between two proteins (e.g., co-occurrence in a pathway, sub cellular compartment); and (4) initial motivation: elucidation of experimental data by evaluation versus prediction of novel protein-protein interaction, to be experimentally validated a posteriori. This work reviews several popular computational scoring methods and software platforms for protein-protein interactions evaluation according to their methodology, comparative strengths and weaknesses, data representation, accessibility, and availability. The scoring methods and platforms described include: CompPASS, SAINT, Decontaminator, MINT, IntAct, STRING, and FunCoup. References to related work are provided throughout in order to provide a concise but thorough introduction to a rapidly growing interdisciplinary field of investigation.
灵敏度、分辨率、质量精度和通量的提高极大地增加了通过质谱鉴定的蛋白质数量。尽管取得了这些进展,但用于研究蛋白质-蛋白质相互作用的最先进的实验方法由于实验方法学中的偏差和局限性,产生的候选相互作用比预期的生物学要多。为了区分真实和虚假相互作用,已经开发并成功应用了基于计算机的方法来减少物理相互作用测定产生的假阳性结果的数量。这些方法可以根据以下几个方面进行分类:(1)所使用的数据类型:基于实验特定测量的方法(例如,光谱计数或鉴定分数)与从外部注释中提取知识的方法(例如,公共相互作用和功能分类数据库);(2)应用的算法类型:物理蛋白质特性的统计描述和估计与预测监督机器学习或文本挖掘算法;(3)评估的蛋白质关系类型:在共复合物中两种蛋白质的直接(二进制)相互作用与两种蛋白质之间任何功能关系的概率(例如,在途径中共同出现,亚细胞区室);和(4)初始动机:通过评估阐明实验数据与预测新的蛋白质-蛋白质相互作用,以事后进行实验验证。这项工作根据其方法、比较优势和劣势、数据表示、可访问性和可用性,审查了几种流行的用于蛋白质-蛋白质相互作用评估的计算评分方法和软件平台。所描述的评分方法和平台包括:CompPASS、SAINT、Decontaminator、MINT、IntAct、STRING 和 FunCoup。为了提供对快速发展的跨学科研究领域的简洁但全面的介绍,在整个文本中都提供了相关工作的参考文献。