School of Mathematical Science, Dalian University of Technology, People's Republic of China.
Protein J. 2011 Apr;30(4):229-39. doi: 10.1007/s10930-011-9324-2.
Predicting catalytic sites of a given enzyme is an important open problem of Bioinformatics. Recently, many machine learning-based methods have been developed which have the advantage that they can account for many sequential or structural features. We found that although many kinds of features are incorporated, protein sequence conservation is the main part of information they used and should play an important role in the future. So we tested several conservation features in their ability to predict catalytic sites by using the Support Vector Machine classifier. Our results suggest that position specific scoring matrix performs better than other features and incorporating conservation information of sequentially adjacent sites is more effective than that of structurally adjacent ones. Moreover, although conservation information is effective in predicting catalytic sites, it is a difficult problem to optimize the combination of conservation features and other ones.
预测给定酶的催化位点是生物信息学中的一个重要开放性问题。最近,已经开发了许多基于机器学习的方法,这些方法的优点是可以考虑许多序列或结构特征。我们发现,尽管包含了许多种类的特征,但蛋白质序列保守性是它们使用的信息的主要部分,并且应该在未来发挥重要作用。因此,我们使用支持向量机分类器测试了几种保守特征在预测催化位点方面的能力。我们的结果表明,位置特异性评分矩阵的性能优于其他特征,并且整合序列相邻位点的保守信息比结构相邻位点的保守信息更有效。此外,尽管保守信息在预测催化位点方面很有效,但优化保守特征与其他特征的组合是一个难题。