基于偏最小二乘法无信息变量消除的肽保留预测

Retention prediction of peptides based on uninformative variable elimination by partial least squares.

作者信息

Put R, Daszykowski M, Baczek T, Vander Heyden Y

机构信息

FABI, Department of Analytical Chemistry and Pharmaceutical Technology, Pharmaceutical Institute, Vrije Universiteit Brussel-VUB, Laarbeeklaan 103, B-1090 Brussels, Belgium.

出版信息

J Proteome Res. 2006 Jul;5(7):1618-25. doi: 10.1021/pr0600430.

DOI:10.1021/pr0600430

PMID:16823969

Abstract

A quantitative structure-retention relationship analysis was performed on the chromatographic retention data of 90 peptides, measured by gradient elution reversed-phase liquid chromatography, and a large set of molecular descriptors computed for each peptide. Such approach may be useful in proteomics research in order to improve the correct identification of peptides. A principal component analysis on the set of 1726 molecular descriptors reveals a high information overlap in the descriptor space. Since variable selection is advisable, the retention of the peptides is modeled with uninformative variable elimination partial least squares, besides classic partial least squares regression. The Kennard and Stone algorithm was used to select a calibration set (63 peptides) from the available samples. This set was used to build the quantitative structure-retention relationship models. The remaining 27 peptides were used as independent external test set to evaluate the predictive power of the constructed models. The UVE-PLS model consists of 5 components only (compared to 7 components in the best PLS model), and has the best predictive properties, i.e., the average error on the retention time is less than 30 s. When compared also to stepwise regression and an empirical model, the obtained UVE-PLS model leads to better and much better predictions, respectively.

摘要

对90种肽的色谱保留数据进行了定量结构-保留关系分析，这些数据通过梯度洗脱反相液相色谱法测量，并为每种肽计算了大量的分子描述符。这种方法在蛋白质组学研究中可能有用，以便改进肽的正确鉴定。对1726个分子描述符集进行主成分分析，发现在描述符空间中存在高度的信息重叠。由于建议进行变量选择，除了经典的偏最小二乘回归外，还用无信息变量消除偏最小二乘法对肽的保留进行建模。使用肯纳德和斯通算法从可用样本中选择一个校准集（63种肽）。该集用于构建定量结构-保留关系模型。其余27种肽用作独立的外部测试集，以评估所构建模型的预测能力。UVE-PLS模型仅由5个成分组成（与最佳PLS模型中的7个成分相比），并且具有最佳的预测性能，即保留时间的平均误差小于30秒。与逐步回归和经验模型相比，所获得的UVE-PLS模型分别导致更好和更好得多的预测。

相似文献

Retention prediction of peptides based on uninformative variable elimination by partial least squares.基于偏最小二乘法无信息变量消除的肽保留预测

J Proteome Res. 2006 Jul;5(7):1618-25. doi: 10.1021/pr0600430.

The evaluation of two-step multivariate adaptive regression splines for chromatographic retention prediction of peptides.用于肽色谱保留预测的两步多元自适应回归样条评估

Proteomics. 2007 May;7(10):1664-77. doi: 10.1002/pmic.200600676.

Advanced QSRR modeling of peptides behavior in RPLC.反相高效液相色谱中肽行为的高级 QSRR 建模。

Talanta. 2010 Jun 15;81(4-5):1711-8. doi: 10.1016/j.talanta.2010.03.028. Epub 2010 Mar 25.

Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches.通过机器学习方法预测黑腹果蝇蛋白质组中肽段的液相色谱保留时间。

Anal Chim Acta. 2009 Jun 30;644(1-2):10-6. doi: 10.1016/j.aca.2009.04.010. Epub 2009 Apr 14.

Comparative multiple quantitative structure-retention relationships modeling of gas chromatographic retention time of essential oils using multiple linear regression, principal component regression, and partial least squares techniques.使用多元线性回归、主成分回归和偏最小二乘技术对精油气相色谱保留时间进行比较多变量定量结构-保留关系建模。

J Chromatogr A. 2009 Jul 3;1216(27):5302-12. doi: 10.1016/j.chroma.2009.05.016. Epub 2009 May 15.

Modeling and prediction of retention behavior of histidine-containing peptides in immobilized metal-affinity chromatography.含组氨酸肽在固定化金属亲和色谱中保留行为的建模与预测

J Sep Sci. 2009 Jun;32(12):2159-69. doi: 10.1002/jssc.200800739.

Improved variable reduction in partial least squares modelling based on predictive-property-ranked variables and adaptation of partial least squares complexity.基于预测属性排序变量的偏最小二乘建模中的变量减少改进和偏最小二乘复杂度的自适应。

Anal Chim Acta. 2011 Oct 31;705(1-2):292-305. doi: 10.1016/j.aca.2011.06.037. Epub 2011 Jun 29.

Review on modelling aspects in reversed-phase liquid chromatographic quantitative structure-retention relationships.反相液相色谱定量结构-保留关系中的建模方面综述。

Anal Chim Acta. 2007 Oct 29;602(2):164-72. doi: 10.1016/j.aca.2007.09.014. Epub 2007 Sep 15.

Performance comparison of partial least squares-related variable selection methods for quantitative structure retention relationships modelling of retention times in reversed-phase liquid chromatography.用于反相液相色谱中保留时间的定量结构保留关系建模的偏最小二乘相关变量选择方法的性能比较

J Chromatogr A. 2015 Dec 11;1424:69-76. doi: 10.1016/j.chroma.2015.10.099. Epub 2015 Nov 6.

Hybrid variable selection in visible and near-infrared spectral analysis for non-invasive quality determination of grape juice.可见近红外光谱分析中用于非侵入式葡萄汁品质测定的混合变量选择

Anal Chim Acta. 2010 Feb 5;659(1-2):229-37. doi: 10.1016/j.aca.2009.11.045. Epub 2009 Nov 26.

引用本文的文献

In Silico High-Performance Liquid Chromatography Method Development via Machine Learning.通过机器学习进行的计算机辅助高效液相色谱方法开发

Anal Chem. 2025 Apr 8;97(13):6991-7001. doi: 10.1021/acs.analchem.4c03466. Epub 2025 Mar 28.

Hyperspectral imaging with machine learning for non-destructive classification of var. , , and similar seeds.利用机器学习的高光谱成像技术对变种、种以及类似种子进行无损分类。

Front Plant Sci. 2022 Nov 29;13:1031849. doi: 10.3389/fpls.2022.1031849. eCollection 2022.

Descriptor Selection via Log-Sum Regularization for the Biological Activities of Chemical Structure.基于对数和正则化的化学结构生物活性描述符选择

Int J Mol Sci. 2017 Dec 22;19(1):30. doi: 10.3390/ijms19010030.

Physicochemical interaction of antitumor acridinone derivatives with DNA in view of QSAR studies.基于定量构效关系（QSAR）研究的抗肿瘤吖啶酮衍生物与DNA的物理化学相互作用

Med Chem Res. 2011 Nov;20(8):1385-1393. doi: 10.1007/s00044-010-9487-y. Epub 2010 Nov 17.

Bayesian nonparametric model for the validation of peptide identification in shotgun proteomics.用于鸟枪法蛋白质组学中肽段鉴定验证的贝叶斯非参数模型。

Mol Cell Proteomics. 2009 Mar;8(3):547-57. doi: 10.1074/mcp.M700558-MCP200. Epub 2008 Nov 12.

Phosphopeptide elution times in reversed-phase liquid chromatography.反相液相色谱中磷酸肽的洗脱时间

J Chromatogr A. 2007 Nov 16;1172(1):9-18. doi: 10.1016/j.chroma.2007.09.032. Epub 2007 Sep 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于偏最小二乘法无信息变量消除的肽保留预测

Retention prediction of peptides based on uninformative variable elimination by partial least squares.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献