Suppr超能文献

利用隐马尔可夫支持向量机预测蛋白质结构中的蛋白质结合位点。

Prediction of protein binding sites in protein structures using hidden Markov support vector machine.

机构信息

Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, PR China.

出版信息

BMC Bioinformatics. 2009 Nov 20;10:381. doi: 10.1186/1471-2105-10-381.

Abstract

BACKGROUND

Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance.

RESULTS

In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods.

CONCLUSION

The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.

摘要

背景

预测两个相互作用的蛋白质之间的结合位点为蛋白质的功能提供了重要线索。最近的蛋白质结合位点预测研究主要基于广泛使用的机器学习技术,如人工神经网络、支持向量机、条件随机场等。然而,预测性能仍然太低,无法在实践中使用。有必要探索新的算法、理论和特征,以进一步提高性能。

结果

在这项研究中,我们引入了一种新的机器学习模型隐马尔可夫支持向量机用于蛋白质结合位点预测。该模型基于最大间隔准则,将蛋白质结合位点预测视为一个序列标记任务。常用的从蛋白质序列和结构中提取的特征,包括蛋白质序列轮廓和残基可及表面积,用于训练隐马尔可夫支持向量机。在六个数据集上进行测试时,基于隐马尔可夫支持向量机的方法表现出优于一些最先进方法的性能,包括人工神经网络、支持向量机和条件随机场。此外,它的运行时间比比较方法短几个数量级。

结论

基于隐马尔可夫支持向量机的方法的改进预测性能和计算效率可以归因于以下三个因素。首先,相邻残基标签之间的关系有助于蛋白质结合位点预测。其次,核技巧在这个领域非常有利。第三,使用切割平面算法,隐马尔可夫支持向量机的训练步骤的复杂度与训练样本的数量呈线性关系。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验