Li Zhong, Wang Jing, Zhang Shunpu, Zhang Qifeng, Wu Wuming
College of Science, Zhejiang Sci-Tech University, Hangzhou 30018, China.
College of Science, Zhejiang Sci-Tech University, Hangzhou 30018, China.
Gene. 2017 Jun 30;618:8-13. doi: 10.1016/j.gene.2017.03.011. Epub 2017 Mar 16.
The coding pattern of protein can greatly affect the prediction accuracy of protein secondary structure. In this paper, a novel hybrid coding method based on the physicochemical properties of amino acids and tendency factors is proposed for the prediction of protein secondary structure. The principal component analysis (PCA) is first applied to the physicochemical properties of amino acids to construct a 3-bit-code, and then the 3 tendency factors of amino acids are calculated to generate another 3-bit-code. Two 3-bit-codes are fused to form a novel hybrid 6-bit-code. Furthermore, we make a geometry-based similarity comparison of the protein primary structure between the reference set and the test set before the secondary structure prediction. We finally use the support vector machine (SVM) to predict those amino acids which are not detected by the primary structure similarity comparison. Experimental results show that our method achieves a satisfactory improvement in accuracy in the prediction of protein secondary structure.
蛋白质的编码模式会极大地影响蛋白质二级结构的预测准确性。本文提出了一种基于氨基酸理化性质和趋势因子的新型混合编码方法用于蛋白质二级结构预测。首先将主成分分析(PCA)应用于氨基酸的理化性质以构建一个3位编码,然后计算氨基酸的3个趋势因子以生成另一个3位编码。两个3位编码融合形成一个新型混合6位编码。此外,在二级结构预测之前,我们对参考集和测试集之间的蛋白质一级结构进行基于几何的相似性比较。最后我们使用支持向量机(SVM)来预测那些未被一级结构相似性比较检测到的氨基酸。实验结果表明,我们的方法在蛋白质二级结构预测的准确性方面取得了令人满意的提高。