Wang Liangjiang, Huang Caiyan, Yang Mary Qu, Yang Jack Y
Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA.
BMC Syst Biol. 2010 May 28;4 Suppl 1(Suppl 1):S3. doi: 10.1186/1752-0509-4-S1-S3.
Understanding how biomolecules interact is a major task of systems biology. To model protein-nucleic acid interactions, it is important to identify the DNA or RNA-binding residues in proteins. Protein sequence features, including the biochemical property of amino acids and evolutionary information in terms of position-specific scoring matrix (PSSM), have been used for DNA or RNA-binding site prediction. However, PSSM is rather designed for PSI-BLAST searches, and it may not contain all the evolutionary information for modelling DNA or RNA-binding sites in protein sequences.
In the present study, several new descriptors of evolutionary information have been developed and evaluated for sequence-based prediction of DNA and RNA-binding residues using support vector machines (SVMs). The new descriptors were shown to improve classifier performance. Interestingly, the best classifiers were obtained by combining the new descriptors and PSSM, suggesting that they captured different aspects of evolutionary information for DNA and RNA-binding site prediction. The SVM classifiers achieved 77.3% sensitivity and 79.3% specificity for prediction of DNA-binding residues, and 71.6% sensitivity and 78.7% specificity for RNA-binding site prediction.
Predictions at this level of accuracy may provide useful information for modelling protein-nucleic acid interactions in systems biology studies. We have thus developed a web-based tool called BindN+ (http://bioinfo.ggc.org/bindn+/) to make the SVM classifiers accessible to the research community.
了解生物分子如何相互作用是系统生物学的一项主要任务。为了对蛋白质 - 核酸相互作用进行建模,识别蛋白质中的DNA或RNA结合残基很重要。蛋白质序列特征,包括氨基酸的生化特性以及基于位置特异性评分矩阵(PSSM)的进化信息,已被用于DNA或RNA结合位点预测。然而,PSSM主要是为PSI-BLAST搜索设计的,它可能不包含用于对蛋白质序列中的DNA或RNA结合位点进行建模的所有进化信息。
在本研究中,已经开发了几种新的进化信息描述符,并使用支持向量机(SVM)对基于序列的DNA和RNA结合残基预测进行了评估。新的描述符显示出可提高分类器性能。有趣的是,通过将新的描述符和PSSM相结合获得了最佳分类器,这表明它们捕获了DNA和RNA结合位点预测进化信息的不同方面。SVM分类器对DNA结合残基预测的灵敏度达到77.3%,特异性达到79.3%,对RNA结合位点预测的灵敏度为71.6%,特异性为78.7%。
这种精度水平的预测可能为系统生物学研究中蛋白质 - 核酸相互作用的建模提供有用信息。因此,我们开发了一个名为BindN+(http://bioinfo.ggc.org/bindn+/)的基于网络的工具,以使研究社区能够使用SVM分类器。