Lin Mingzhi, Hu Bin, Chen Lijuan, Sun Peng, Fan Yi, Wu Ping, Chen Xin
Department of Bioinformatics, Zhejiang University, Hangzhou, People's Republic of China, 310058.
Plant Physiol. 2009 Sep;151(1):34-46. doi: 10.1104/pp.109.141317. Epub 2009 Jul 10.
Knowledge of the protein interaction network is useful to assist molecular mechanism studies. Several major repositories have been established to collect and organize reported protein interactions. Many interactions have been reported in several model organisms, yet a very limited number of plant interactions can thus far be found in these major databases. Computational identification of potential plant interactions, therefore, is desired to facilitate relevant research. In this work, we constructed a support vector machine model to predict potential Arabidopsis (Arabidopsis thaliana) protein interactions based on a variety of indirect evidence. In a 100-iteration bootstrap evaluation, the confidence of our predicted interactions was estimated to be 48.67%, and these interactions were expected to cover 29.02% of the entire interactome. The sensitivity of our model was validated with an independent evaluation data set consisting of newly reported interactions that did not overlap with the examples used in model training and testing. Results showed that our model successfully recognized 28.91% of the new interactions, similar to its expected sensitivity (29.02%). Applying this model to all possible Arabidopsis protein pairs resulted in 224,206 potential interactions, which is the largest and most accurate set of predicted Arabidopsis interactions at present. In order to facilitate the use of our results, we present the Predicted Arabidopsis Interactome Resource, with detailed annotations and more specific per interaction confidence measurements. This database and related documents are freely accessible at http://www.cls.zju.edu.cn/pair/.
蛋白质相互作用网络的知识有助于辅助分子机制研究。已经建立了几个主要的数据库来收集和整理已报道的蛋白质相互作用。在几种模式生物中已经报道了许多相互作用,但目前在这些主要数据库中发现的植物相互作用数量非常有限。因此,需要通过计算来识别潜在的植物相互作用,以促进相关研究。在这项工作中,我们基于各种间接证据构建了一个支持向量机模型,用于预测拟南芥(Arabidopsis thaliana)潜在的蛋白质相互作用。在100次迭代的自助评估中,我们预测的相互作用的置信度估计为48.67%,并且这些相互作用预计覆盖整个相互作用组的29.02%。我们的模型的敏感性通过一个独立的评估数据集进行了验证,该数据集由新报道的相互作用组成,这些相互作用与模型训练和测试中使用的示例不重叠。结果表明,我们的模型成功识别了28.91%的新相互作用,与其预期的敏感性(29.02%)相似。将该模型应用于所有可能的拟南芥蛋白质对,得到了224,206个潜在的相互作用,这是目前最大且最准确的一组预测的拟南芥相互作用。为了便于使用我们的结果,我们提供了预测的拟南芥相互作用组资源,带有详细的注释和更具体的每个相互作用的置信度测量。该数据库和相关文档可在http://www.cls.zju.edu.cn/pair/免费获取。