Suppr超能文献

使用支持向量机预测蛋白质结构类别。

Prediction of protein structural classes using support vector machines.

作者信息

Sun X-D, Huang R-B

机构信息

College of Life Science and Biotechnology, Guangxi University, Nanning, Guangxi, China.

出版信息

Amino Acids. 2006 Jun;30(4):469-75. doi: 10.1007/s00726-005-0239-0. Epub 2006 Apr 20.

Abstract

The support vector machine, a machine-learning method, is used to predict the four structural classes, i.e. mainly alpha, mainly beta, alpha-beta and fss, from the topology-level of CATH protein structure database. For the binary classification, any two structural classes which do not share any secondary structure such as alpha and beta elements could be classified with as high as 90% accuracy. The accuracy, however, will decrease to less than 70% if the structural classes to be classified contain structure elements in common. Our study also shows that the dimensions of feature space 20(2) = 400 (for dipeptide) and 20(3) = 8 000 (for tripeptide) give nearly the same prediction accuracy. Among these 4 structural classes, multi-class classification gives an overall accuracy of about 52%, indicating that the multi-class classification technique in support of vector machines may still need to be further improved in future investigation.

摘要

支持向量机作为一种机器学习方法,用于从CATH蛋白质结构数据库的拓扑层面预测四种结构类别,即主要为α结构、主要为β结构、α-β结构和fss结构。对于二元分类,任何两个不共享任何二级结构(如α和β元件)的结构类别分类准确率可达90%。然而,如果要分类的结构类别包含共同的结构元件,准确率将降至70%以下。我们的研究还表明,特征空间维度20(2)=400(用于二肽)和20(3)=8000(用于三肽)给出的预测准确率几乎相同。在这4种结构类别中,多类别分类的总体准确率约为52%,这表明支持向量机中的多类别分类技术在未来研究中可能仍需进一步改进。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验