Liu Guimei, Li Jinyan, Wong Limsoon
School of Computing, National University of Singapore, Singapore.
Genome Inform. 2008;21:138-49.
High-throughput protein interaction data, with ever-increasing volume, are becoming the foundation of many biological discoveries. However, high-throughput protein interaction data are often associated with high false positive and false negative rates. It is desirable to develop scalable methods to identify these errors. In this paper, we develop a computational method to identify spurious interactions and missing interactions from high-throughput protein interaction data. Our method uses both local and global topological information of protein pairs, and it assigns a local interacting score and a global interacting score to every protein pair. The local interacting score is calculated based on the common neighbors of the protein pairs. The global interacting score is computed using globally interacting protein group pairs. The two scores are then combined to obtain a final score called LGTweight to indicate the interacting possibility of two proteins. We tested our method on the DIP yeast interaction dataset. The experimental results show that the interactions ranked top by our method have higher functional homogeneity and localization coherence than existing methods, and our method also achieves higher sensitivity and precision under 5-fold cross validation than existing methods.
高通量蛋白质相互作用数据量不断增加,正成为许多生物学发现的基础。然而,高通量蛋白质相互作用数据往往伴随着高假阳性率和高假阴性率。开发可扩展的方法来识别这些错误是很有必要的。在本文中,我们开发了一种计算方法,用于从高通量蛋白质相互作用数据中识别虚假相互作用和缺失相互作用。我们的方法使用蛋白质对的局部和全局拓扑信息,并为每个蛋白质对分配一个局部相互作用分数和一个全局相互作用分数。局部相互作用分数基于蛋白质对的共同邻居计算得出。全局相互作用分数使用全局相互作用蛋白质组对来计算。然后将这两个分数结合起来,得到一个称为LGTweight的最终分数,以表明两种蛋白质的相互作用可能性。我们在DIP酵母相互作用数据集上测试了我们的方法。实验结果表明,我们的方法排名靠前的相互作用比现有方法具有更高的功能同质性和定位一致性,并且在5折交叉验证下,我们的方法比现有方法具有更高的灵敏度和精度。