Put R, Daszykowski M, Baczek T, Vander Heyden Y
FABI, Department of Analytical Chemistry and Pharmaceutical Technology, Pharmaceutical Institute, Vrije Universiteit Brussel-VUB, Laarbeeklaan 103, B-1090 Brussels, Belgium.
J Proteome Res. 2006 Jul;5(7):1618-25. doi: 10.1021/pr0600430.
A quantitative structure-retention relationship analysis was performed on the chromatographic retention data of 90 peptides, measured by gradient elution reversed-phase liquid chromatography, and a large set of molecular descriptors computed for each peptide. Such approach may be useful in proteomics research in order to improve the correct identification of peptides. A principal component analysis on the set of 1726 molecular descriptors reveals a high information overlap in the descriptor space. Since variable selection is advisable, the retention of the peptides is modeled with uninformative variable elimination partial least squares, besides classic partial least squares regression. The Kennard and Stone algorithm was used to select a calibration set (63 peptides) from the available samples. This set was used to build the quantitative structure-retention relationship models. The remaining 27 peptides were used as independent external test set to evaluate the predictive power of the constructed models. The UVE-PLS model consists of 5 components only (compared to 7 components in the best PLS model), and has the best predictive properties, i.e., the average error on the retention time is less than 30 s. When compared also to stepwise regression and an empirical model, the obtained UVE-PLS model leads to better and much better predictions, respectively.
对90种肽的色谱保留数据进行了定量结构-保留关系分析,这些数据通过梯度洗脱反相液相色谱法测量,并为每种肽计算了大量的分子描述符。这种方法在蛋白质组学研究中可能有用,以便改进肽的正确鉴定。对1726个分子描述符集进行主成分分析,发现在描述符空间中存在高度的信息重叠。由于建议进行变量选择,除了经典的偏最小二乘回归外,还用无信息变量消除偏最小二乘法对肽的保留进行建模。使用肯纳德和斯通算法从可用样本中选择一个校准集(63种肽)。该集用于构建定量结构-保留关系模型。其余27种肽用作独立的外部测试集,以评估所构建模型的预测能力。UVE-PLS模型仅由5个成分组成(与最佳PLS模型中的7个成分相比),并且具有最佳的预测性能,即保留时间的平均误差小于30秒。与逐步回归和经验模型相比,所获得的UVE-PLS模型分别导致更好和更好得多的预测。