Chen Chao, Chen Lixuan, Zou Xiaoyong, Cai Peixiang
School of Traditional Chinese Medicine, Guangdong Pharmaceutical University, Guangzhou 510006, PR China.
Protein Pept Lett. 2009;16(1):27-31. doi: 10.2174/092986609787049420.
Protein secondary structure carries information about local structural arrangements. Significant majority of successful methods for predicting the secondary structure is based on multiple sequence alignment. However, the multiple alignment fails to achieve accurate results when a protein sequence is characterized by low homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation. The method is featured by employing a support vector machine (SVM) regressing system and adopting a different pseudo amino acid composition (PseAAC), which can partially take into account the sequence-order effects to represent protein samples. It was shown by both the self-consistency test and the independent-dataset test that the trained SVM has remarkable power in grasping the relationship between the PseAAC and the content of protein secondary structural elements, including alpha-helix, 3(10)-helix, pi-helix, beta-strand, beta-bridge, turn, bend and the rest random coil. Results prior to or competitive with the popular methods have been obtained, which indicate that the present method may at least serve as an alternative to the existing predictors in this area.
蛋白质二级结构携带有关局部结构排列的信息。大多数成功的二级结构预测方法都是基于多序列比对。然而,当蛋白质序列的同源性较低时,多序列比对无法获得准确的结果。为此,我们提出了一种通过综合序列表征来预测二级结构含量的新方法。该方法的特点是采用支持向量机(SVM)回归系统,并采用不同的伪氨基酸组成(PseAAC),它可以部分考虑序列顺序效应来表征蛋白质样本。自一致性测试和独立数据集测试均表明,训练后的支持向量机在把握PseAAC与蛋白质二级结构元件(包括α螺旋、3(10)螺旋、π螺旋、β链、β桥、转角、弯曲和其余的无规卷曲)含量之间的关系方面具有显著能力。已获得与流行方法相当或优于流行方法的结果,这表明本方法至少可以作为该领域现有预测方法的替代方法。