Bioinformatics Project, National Institute of Biomedical Innovation, 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan.
BMC Bioinformatics. 2014 Jun 23;15:213. doi: 10.1186/1471-2105-15-213.
Identification of protein-protein interactions (PPIs) is essential for a better understanding of biological processes, pathways and functions. However, experimental identification of the complete set of PPIs in a cell/organism ("an interactome") is still a difficult task. To circumvent limitations of current high-throughput experimental techniques, it is necessary to develop high-performance computational methods for predicting PPIs.
In this article, we propose a new computational method to predict interaction between a given pair of protein sequences using features derived from known homologous PPIs. The proposed method is capable of predicting interaction between two proteins (of unknown structure) using Averaged One-Dependence Estimators (AODE) and three features calculated for the protein pair: (a) sequence similarities to a known interacting protein pair (FSeq), (b) statistical propensities of domain pairs observed in interacting proteins (FDom) and (c) a sum of edge weights along the shortest path between homologous proteins in a PPI network (FNet). Feature vectors were defined to lie in a half-space of the symmetrical high-dimensional feature space to make them independent of the protein order. The predictability of the method was assessed by a 10-fold cross validation on a recently created human PPI dataset with randomly sampled negative data, and the best model achieved an Area Under the Curve of 0.79 (pAUC0.5% = 0.16). In addition, the AODE trained on all three features (named PSOPIA) showed better prediction performance on a separate independent data set than a recently reported homology-based method.
Our results suggest that FNet, a feature representing proximity in a known PPI network between two proteins that are homologous to a target protein pair, contributes to the prediction of whether the target proteins interact or not. PSOPIA will help identify novel PPIs and estimate complete PPI networks. The method proposed in this article is freely available on the web at http://mizuguchilab.org/PSOPIA.
鉴定蛋白质-蛋白质相互作用(PPIs)对于更好地理解生物过程、途径和功能至关重要。然而,在细胞/生物体中鉴定完整的 PPI 组(即“相互作用组”)仍然是一项艰巨的任务。为了规避当前高通量实验技术的局限性,有必要开发用于预测 PPI 的高性能计算方法。
在本文中,我们提出了一种新的计算方法,该方法使用源自已知同源 PPI 的特征来预测给定蛋白质对之间的相互作用。该方法能够使用平均单依赖估计器(AODE)和为蛋白质对计算的三个特征来预测两个未知结构的蛋白质之间的相互作用:(a)与已知相互作用蛋白质对的序列相似性(FSeq),(b)在相互作用蛋白质中观察到的结构域对的统计倾向(FDom),以及(c)在 PPI 网络中同源蛋白质之间最短路径上的边权重之和(FNet)。特征向量被定义为位于对称高维特征空间的半空间中,以使它们与蛋白质顺序无关。该方法的可预测性通过对最近创建的人类 PPI 数据集进行 10 折交叉验证来评估,随机抽样负数据,并使用最佳模型获得 0.79 的曲线下面积(pAUC0.5%=0.16)。此外,在独立的独立数据集上,基于所有三个特征(称为 PSOPIA)训练的 AODE 显示出比最近报道的基于同源性的方法更好的预测性能。
我们的结果表明,FNet 是一种特征,代表目标蛋白质对的同源蛋白质在已知 PPI 网络中的接近程度,有助于预测目标蛋白质是否相互作用。PSOPIA 将有助于识别新的 PPI 并估计完整的 PPI 网络。本文提出的方法可在 http://mizuguchilab.org/PSOPIA 上免费获得。