Qin Li-Tang, Liu Shu-Shen, Liu Hai-Ling, Tong Juan
Key Laboratory of Yangtze River Water Environment, Ministry of Education, College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China.
J Chromatogr A. 2009 Jul 3;1216(27):5302-12. doi: 10.1016/j.chroma.2009.05.016. Epub 2009 May 15.
Quantitative structure-retention relationships (QSRR) models were built for a data set consisting of 96 essential oils and used to predict their gas chromatographic (GC) retention times (t(R)). Multiple linear regression (MLR), principal component regression (PCR), and partial least squares (PLS) have been applied to build different QSRR models by using 13 nonzero E-state indexes and 56 descriptors calculated from TSAR software. The three chemometric methods (MLR, PCR, and PLS) for evaluation of GC t(R) values of essential oils have been compared. The best model based on the whole data set derived from MLR model (model M2) appears to be the best predictive power (r(2)=0.9689 and q(2)=0.9631) for this data set. The whole data set was splitted into a training set consisting of 72 compounds and a test set consisting of 24 compounds. The model based on the training set derived from MLR offered the highest r(2) of 0.9756 and q(2) of 0.9693. The best model base on the training set obtained from PLS not only showed a good internal predictive power (r(2)=0.9703 and q(2)=0.9633) but also offered the highest external predictive power (R(2)=0.9588 and q(2)(ext)=0.9572). The results showed that two E-state indexes (sssCH and sOH) and five molecular connective indices ((1)chi(B), (2)chi(p), (3)chi(C), (4)chi(C), and (6)chi(p)) closely relate to the GC t(R) values of essential oils. The applicability domain of the QSRR models were defined by control leverage values (h*) and the models can be used to predict the unknown compounds falling in this domain.
针对一个由96种精油组成的数据集构建了定量结构-保留关系(QSRR)模型,并用于预测它们的气相色谱(GC)保留时间(t(R))。通过使用13个非零E态指数和从TSAR软件计算得到的56个描述符,应用多元线性回归(MLR)、主成分回归(PCR)和偏最小二乘法(PLS)构建了不同的QSRR模型。比较了这三种化学计量学方法(MLR、PCR和PLS)对精油GC t(R)值的评估效果。基于MLR模型从整个数据集中导出的最佳模型(模型M2)对于该数据集似乎具有最佳预测能力(r(2)=0.9689,q(2)=0.9631)。将整个数据集分为一个由72种化合物组成的训练集和一个由24种化合物组成的测试集。基于从MLR导出的训练集的模型具有最高的r(2)值0.9756和q(2)值0.9693。从PLS获得的基于训练集的最佳模型不仅显示出良好的内部预测能力(r(2)=0.9703,q(2)=0.9633),而且具有最高的外部预测能力(R(2)=0.9588,q(2)(ext)=0.9572)。结果表明,两个E态指数(sssCH和sOH)和五个分子连接性指数((1)chi(B)、(2)chi(p)、(3)chi(C)、(4)chi(C)和(6)chi(p))与精油的GC t(R)值密切相关。通过控制杠杆值(h*)定义了QSRR模型的适用域,这些模型可用于预测落在该域内的未知化合物。