基于特征选择和支持向量机的域剖面预测域-域相互作用。
Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines.
机构信息
Department of Computer and Information Sciences, University of Delaware 421 Smith Hall, Newark, DE 19716, USA.
出版信息
BMC Bioinformatics. 2010 Oct 29;11:537. doi: 10.1186/1471-2105-11-537.
BACKGROUND
Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles.
RESULTS
In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure.
CONCLUSIONS
We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows.
背景
蛋白质-蛋白质相互作用(PPI)在细胞功能中起着至关重要的作用。当前实验方法的成本、时间和其他限制因素促使人们开发了用于预测 PPI 的计算方法。由于蛋白质相互作用通常通过结构域而不是整个分子发生,因此预测结构域-结构域相互作用(DDI)是预测 PPI 的重要步骤。迄今为止开发的计算方法已经利用了来自不同来源的各种信息,从一级序列到分子结构,再到进化概况。
结果
在本文中,我们提出了一种使用支持向量机(SVM)预测 DDI 的计算方法,该方法基于表示为交互轮廓隐马尔可夫模型(ipHMM)的结构域,其中结构域中的相互作用残基根据可从蛋白质数据库(PDB)获得的三维结构信息进行了明确建模。首先提取关于结构域的特征作为从 ipHMM 导出的 Fisher 分数,然后使用奇异值分解(SVD)进行选择。通过将其选择的特征向量连接在一起来表示结构域对,并使用基于这些特征向量训练的支持向量机进行分类。该方法通过从 3DID 数据库中采用的一组相互作用蛋白对进行的留一交叉验证实验进行了测试。与 InterPreTS(通过三级结构进行相互作用预测)相比,预测精度有了显著提高,InterPreTS 是一种用于预测 PPI 的现有方法,它也使用已知三维结构的序列和复合物。
结论
我们表明,通过基于 Fisher 分数、奇异值分解和基于支持向量机的监督学习的特征选择,利用结构域轮廓中固有的信息,可以显著增强结构域-结构域相互作用预测。数据集和源代码可在 http://liao.cis.udel.edu/pub/svdsvm 上免费获得。在 Matlab 中实现,支持 Linux 和 MS Windows。