Shao Xiaojian, Tian Yingjie, Wu Lingyun, Wang Yong, Jing Ling, Deng Naiyang
College of Science, China Agricultural University, Beijing 100083, China.
J Theor Biol. 2009 May 21;258(2):289-93. doi: 10.1016/j.jtbi.2009.01.024. Epub 2009 Feb 6.
In this paper, support vector machines (SVMs) are applied to predict the nucleic-acid-binding proteins. We constructed two classifiers to differentiate DNA/RNA-binding proteins from non-nucleic-acid-binding proteins by using a conjoint triad feature which extract information directly from amino acids sequence of protein. Both self-consistency and jackknife tests show promising results on the protein datasets in which the sequences identity is less than 25%. In the self-consistency test, the predictive accuracy is 90.37% for DNA-binding proteins and 89.70% for RNA-binding proteins. In the jackknife test, the predictive accuracies are 78.93% and 76.75%, respectively. Comparison results show that our method is very competitive by outperforming other previously published sequence-based prediction methods.
在本文中,支持向量机(SVM)被应用于预测核酸结合蛋白。我们构建了两个分类器,通过使用直接从蛋白质氨基酸序列中提取信息的三联体结合特征,将DNA/RNA结合蛋白与非核酸结合蛋白区分开来。自一致性检验和留一法检验在序列同一性小于25%的蛋白质数据集上均显示出了良好的结果。在自一致性检验中,DNA结合蛋白的预测准确率为90.37%,RNA结合蛋白的预测准确率为89.70%。在留一法检验中,预测准确率分别为78.93%和76.75%。比较结果表明,我们的方法通过优于其他先前发表的基于序列的预测方法而具有很强的竞争力。