Liu H X, Xue C X, Zhang R S, Yao X J, Liu M C, Hu Z D, Fan B T
Department of Chemistry, Lanzhou University, Lanzhou 730000, China.
J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):1979-86. doi: 10.1021/ci049891a.
A new method support vector machine (SVM) and the heuristic method (HM) were used to develop the nonlinear and linear models between the capacity factor (logk) and seven molecular descriptors of 75 peptides for the first time. The molecular descriptors representing the structural features of the compounds only included the constitutional and topological descriptors, which can be obtained easily without optimizing the structure of the molecule. The seven molecular descriptors selected by the heuristic method in CODESSA were used as inputs for SVM. The results obtained by SVM were compared with those obtained by the heuristic method. The prediction result of the SVM model is better than that of heuristic method. For the test set, a predictive correlation coefficient R = 0.9801 and root-mean-square error of 0.1523 were obtained. The prediction results are in very good agreement with the experimental values. But the linear model of the heuristic method is easier to understand and ready to use for a chemist. This paper provided a new and effective method for predicting the chromatography retention of peptides and some insight into the structural features which are related to the capacity factor of peptides.
首次使用一种新方法支持向量机(SVM)和启发式方法(HM)来建立75种肽的容量因子(logk)与七个分子描述符之间的非线性和线性模型。代表化合物结构特征的分子描述符仅包括组成和拓扑描述符,无需优化分子结构即可轻松获得。启发式方法在CODESSA中选择的七个分子描述符用作支持向量机的输入。将支持向量机得到的结果与启发式方法得到的结果进行比较。支持向量机模型的预测结果优于启发式方法。对于测试集,得到预测相关系数R = 0.9801,均方根误差为0.1523。预测结果与实验值非常吻合。但是启发式方法的线性模型更易于理解,并且便于化学家使用。本文提供了一种预测肽色谱保留的新的有效方法,并对与肽容量因子相关的结构特征有了一些见解。