Liu H X, Zhang R S, Yao X J, Liu M C, Hu Z D, Fan B T
Department of Chemistry, Lanzhou University, Lanzhou 730000, China.
J Chem Inf Comput Sci. 2004 Jan-Feb;44(1):161-7. doi: 10.1021/ci034173u.
The support vector machine (SVM), as a novel type of a learning machine, for the first time, was used to develop a QSPR model that relates the structures of 35 amino acids to their isoelectric point. Molecular descriptors calculated from the structure alone were used to represent molecular structures. The seven descriptors selected using GA-PLS, which is a sophisticated hybrid approach that combines GA as a powerful optimization method with PLS as a robust statistical method for variable selection, were used as inputs of RBFNNs and SVM to predict the isoelectric point of an amino acid. The optimal QSPR model developed was based on support vector machines, which showed the following results: the root-mean-square error of 0.2383 and the prediction correlation coefficient R=0.9702 were obtained for the whole data set. Satisfactory results indicated that the GA-PLS approach is a very effective method for variable selection, and the support vector machine is a very promising tool for the nonlinear approximation.
支持向量机(SVM)作为一种新型学习机,首次被用于建立一个将35种氨基酸的结构与其等电点相关联的定量构效关系(QSPR)模型。仅从结构计算得到的分子描述符被用于表示分子结构。使用遗传算法-偏最小二乘法(GA-PLS)选择的七个描述符作为径向基函数神经网络(RBFNNs)和支持向量机的输入来预测氨基酸的等电点,GA-PLS是一种复杂的混合方法,它将作为强大优化方法的遗传算法与作为稳健变量选择统计方法的偏最小二乘法相结合。所建立的最优QSPR模型基于支持向量机,结果如下:整个数据集的均方根误差为0.2383,预测相关系数R = 0.9702。令人满意的结果表明,GA-PLS方法是一种非常有效的变量选择方法,支持向量机是一种非常有前途的非线性逼近工具。