Han Dongsoo, Kim Hong-Soog, Seo Jungmin, Jang Woohyuk
School of Engineering, Information and Communications University, PO Box 77, Yusong, Daejeon 305-600, Korea.
Genome Inform. 2003;14:250-9.
In this paper, we propose a probabilistic framework to predict the interaction probability of proteins. The notion of domain combination and domain combination pair is newly introduced and the prediction model in the framework takes domain combination pair as a basic unit of protein interactions to overcome the limitations of the conventional domain pair based prediction systems. The framework largely consists of prediction preparation and service stages. In the prediction preparation stage, two appearance probability matrices are constructed. Each matrix holds information on appearance frequencies of domain combination pairs in the interacting and non-interacting sets of protein pairs, respectively. Based on the appearance probability matrix, a probability equation is devised. The equation maps a protein pair to a real number in the range of 0 to 1. Two distributions of interacting and non-interacting sets of protein pairs are obtained using the equation. In the prediction service stage, the interaction probability of a protein pair is predicted using the distributions and the equation. The validity of the prediction model is evaluated for the interacting set of protein pairs in a Yeast organism and artificially generated non-interacting set of protein pairs. When 80% of the set of interacting protein pairs in DIP (Database of Interacting Proteins) is used as a learning set of interacting protein pairs, very high sensitivity (86%) and moderate specificity (56%) are achieved within our framework.
在本文中,我们提出了一个概率框架来预测蛋白质的相互作用概率。我们新引入了结构域组合和结构域组合对的概念,并且该框架中的预测模型将结构域组合对作为蛋白质相互作用的基本单元,以克服传统基于结构域对的预测系统的局限性。该框架主要由预测准备和服务阶段组成。在预测准备阶段,构建两个出现概率矩阵。每个矩阵分别保存关于结构域组合对在相互作用和非相互作用蛋白质对集合中的出现频率的信息。基于出现概率矩阵,设计了一个概率方程。该方程将一个蛋白质对映射到0到1范围内的一个实数。使用该方程获得相互作用和非相互作用蛋白质对集合的两种分布。在预测服务阶段,使用这些分布和方程来预测蛋白质对的相互作用概率。我们在酵母生物体中的相互作用蛋白质对集合以及人工生成的非相互作用蛋白质对集合上评估了预测模型的有效性。当将DIP(相互作用蛋白质数据库)中80%的相互作用蛋白质对集合用作相互作用蛋白质对的学习集时,在我们的框架内实现了非常高的灵敏度(86%)和中等的特异性(56%)。