Nouretdinov Ilia, Gammerman Alex, Qi Yanjun, Klein-Seetharaman Judith
Computer Learning Research Centre, Royal Holloway University of London, London, UK.
Pac Symp Biocomput. 2012:311-22.
Identifying protein-protein interactions (PPI's) is critical for understanding virtually all cellular molecular mechanisms. Previously, predicting PPI's was treated as a binary classification task and has commonly been solved in a supervised setting which requires a positive labeled set of known PPI's and a negative labeled set of non-interacting protein pairs. In those methods, the learner provides the likelihood of the predicted interaction, but without a confidence level associated with each prediction. Here, we apply a conformal prediction framework to make predictions and estimate confidence of the predictions. The conformal predictor uses a function measuring relative 'strangeness' interacting pairs to check whether prediction of a new example added to the sequence of already known PPI's would conform to the 'exchangeability' assumption: distribution of interacting pairs is invariant with any permutations of the pairs. In fact, this is the only assumption we make about the data. Another advantage is that the user can control a number of errors by providing a desirable confidence level. This feature of CP is very useful for a ranking list of possible interactive pairs. In this paper, the conformal method has been developed to deal with just one class - class interactive proteins - while there is not clearly defined of 'non-interactive'pairs. The confidence level helps the biologist in the interpretation of the results, and better assists the choices of pairs for experimental validation. We apply the proposed conformal framework to improve the identification of interacting pairs between HIV-1 and human proteins.
识别蛋白质-蛋白质相互作用(PPI)对于理解几乎所有细胞分子机制至关重要。以前,预测PPI被视为二元分类任务,通常在监督设置中解决,这需要一组已知PPI的正标记集和一组非相互作用蛋白质对的负标记集。在这些方法中,学习者提供预测相互作用的可能性,但没有与每个预测相关的置信水平。在这里,我们应用共形预测框架进行预测并估计预测的置信度。共形预测器使用一个测量相互作用对相对“奇异度”的函数,来检查添加到已知PPI序列中的新示例的预测是否符合“可交换性”假设:相互作用对的分布在对的任何排列下都是不变的。事实上,这是我们对数据所做的唯一假设。另一个优点是用户可以通过提供所需的置信水平来控制错误数量。CP的这个特性对于可能的相互作用对的排名列表非常有用。在本文中,共形方法已被开发用于处理仅一类——相互作用的蛋白质类——而“非相互作用”对没有明确定义。置信水平有助于生物学家解释结果,并更好地协助选择用于实验验证的对。我们应用所提出的共形框架来改进HIV-1与人类蛋白质之间相互作用对的识别。