• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

[基于不同极性固定相的气相色谱保留指数构建机器学习集成预测模型]

[Construction of a machine learning ensemble prediction model for gas chromatographic retention index on stationary phases with different polarities].

作者信息

Wang Qian-Yi, Zhu Yong-le, Li Xue-Hua

机构信息

Key Laboratory of Industrial Ecology and Environmental Engineering, Ministry of Education, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China.

出版信息

Se Pu. 2025 Apr 8;43(4):355-362. doi: 10.3724/SP.J.1123.2024.07014.

DOI:10.3724/SP.J.1123.2024.07014
PMID:40133201
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11966378/
Abstract

Gas chromatography is an analytical technique that is widely used to separate and identify various compounds. The retention index (RI) plays a significant role in gas chromatography because it provides a standardized measure for characterizing the retention performance of compounds under specific conditions and is a powerful compound-identification tool, particularly when dealing with complex mixtures. Consequently, the ability to predict RI values is a meaningful objective, particularly for multipolar phases, owing to significant variations in RI across various polar stationary phases. To address this issue, we developed a model for predicting gas-chromatographic RIs on stationary phases of varying polarity by collecting 4183 pieces of retention-index data for 2499 compounds on eight types of stationary phase from the literature and databases. Stationary phases were further classified into five categories based on their the McReynolds constants, namely: strongly polar, polar, medium polar, weakly polar, and non-polar. This classification ensured that the model is capable of handling a wide range of polarities, thereby enhancing its versatility and applicability to various analytical scenarios. The predictive model was constructed by integrating two types of composite feature. The 1D and 2D molecular-structural features of the compounds were first determined; these features capture the chemical and physical properties of the compounds, including their relative molecular masses, functional groups, and topological indices. These descriptors provide a comprehensive understanding of the molecular characteristics that influence retention behavior. Stationary-phase polarity was then one-hot encoded, which converted categorical stationary-phase-polarity information into a format that can be effectively used by machine-learning algorithms. This encoding technique ensures that the model can distinguish among the effects of various polarities on the retention behavior of the compounds. Nine algorithms were used to construct predictive machine-learning models, including linear regression, decision tree, random forest, support vector machine (SVM), k-nearest-neighbor (KNN), gradient-boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and light gradient boosting (LightGBM) algorithms. Voting regression was used to build an optimally performing ensemble learning model based on the XGBoost and LightGBM algorithms. This ensemble model, which combines the strengths of multiple individual models, exhibited exceptional performance, with a training set coefficient of determination () of 0.99, a training set root mean square error (RMSE) of 101.85, a test set of 0.97, and a test set RMSE of 107.44. Williams plots were used to characterize the application domain of the model, with over 94% of the data lying within the domain, indicative of broad applicability and high predictive confidence. The successful development of this predictive retention-index model represents a significant advancement in the gas-chromatography field. The developed model offers several key benefits by integrating advanced machine learning techniques with comprehensive chemical- and physical-property data; it highly accurately predicts RI values across a wide range of polar stationary phases. The developed ensemble model exhibits superior robustness and predictive abilities compared to individual machine-learning models. The establishment of this model is of great scientific significance and practical value for improving the efficiency and accuracy of target and non-target gas-chromatographic analyses.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/f2b5b2c79f12/img_7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/a9b72d46209a/img_1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/3f31a49a33cd/img_2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/3d390cbb94ac/img_3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/3b87f43ad5df/img_4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/d659e6e13cae/img_5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/8d5979bf8d36/img_6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/f2b5b2c79f12/img_7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/a9b72d46209a/img_1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/3f31a49a33cd/img_2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/3d390cbb94ac/img_3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/3b87f43ad5df/img_4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/d659e6e13cae/img_5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/8d5979bf8d36/img_6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69fe/11966378/f2b5b2c79f12/img_7.jpg
摘要

气相色谱法是一种广泛用于分离和鉴定各种化合物的分析技术。保留指数(RI)在气相色谱中起着重要作用,因为它为表征化合物在特定条件下的保留性能提供了一种标准化的度量,并且是一种强大的化合物鉴定工具,特别是在处理复杂混合物时。因此,预测RI值的能力是一个有意义的目标,特别是对于多极相而言,因为在各种极性固定相上RI存在显著差异。为了解决这个问题,我们通过从文献和数据库中收集2499种化合物在八种类型固定相上的4183条保留指数数据,开发了一个用于预测不同极性固定相上气相色谱保留指数的模型。固定相根据其麦克雷诺兹常数进一步分为五类,即:强极性、极性、中等极性、弱极性和非极性。这种分类确保模型能够处理广泛的极性范围,从而提高其通用性和对各种分析场景的适用性。预测模型通过整合两种类型的复合特征构建。首先确定化合物的一维和二维分子结构特征;这些特征捕获化合物的化学和物理性质,包括它们的相对分子量、官能团和拓扑指数。这些描述符提供了对影响保留行为的分子特征的全面理解。然后对固定相极性进行独热编码,将分类的固定相极性信息转换为机器学习算法可以有效使用的格式。这种编码技术确保模型能够区分各种极性对化合物保留行为的影响。使用九种算法构建预测性机器学习模型,包括线性回归、决策树、随机森林、支持向量机(SVM)、k近邻(KNN)、梯度提升决策树(GBDT)、极端梯度提升(XGBoost)和轻梯度提升(LightGBM)算法。使用投票回归基于XGBoost和LightGBM算法构建性能最优的集成学习模型。这个集成模型结合了多个个体模型的优势,表现出卓越的性能,训练集决定系数()为0.99,训练集均方根误差(RMSE)为101.85,测试集为0.97,测试集RMSE为107.44。使用威廉姆斯图来表征模型的应用领域,超过94%的数据位于该领域内,表明其具有广泛的适用性和较高的预测置信度。这个预测保留指数模型的成功开发代表了气相色谱领域中的一项重大进展。所开发的模型通过将先进的机器学习技术与全面的化学和物理性质数据相结合,提供了几个关键优势;它能够高精度地预测广泛极性固定相上的RI值。与单个机器学习模型相比,所开发的集成模型表现出卓越的稳健性和预测能力。该模型的建立对于提高目标和非目标气相色谱分析的效率和准确性具有重要的科学意义和实用价值。

相似文献

1
[Construction of a machine learning ensemble prediction model for gas chromatographic retention index on stationary phases with different polarities].[基于不同极性固定相的气相色谱保留指数构建机器学习集成预测模型]
Se Pu. 2025 Apr 8;43(4):355-362. doi: 10.3724/SP.J.1123.2024.07014.
2
Development of deep learning software to improve HPLC and GC predictions using a new crown-ether based mesogenic stationary phase and beyond.开发深度学习软件,以改进使用新型冠醚基介晶固定相及其他情况的高效液相色谱(HPLC)和气相色谱(GC)预测。
J Chromatogr A. 2025 Jan 4;1739:465476. doi: 10.1016/j.chroma.2024.465476. Epub 2024 Oct 28.
3
Transfer of gas chromatographic retention data among poly(siloxane) columns by quantitative structure-retention relationships based on molecular descriptors of both solutes and stationary phases.基于溶质和固定相分子描述符的定量构效关系在聚(硅氧烷)柱之间传递色谱保留数据。
J Chromatogr A. 2022 Jan 25;1663:462758. doi: 10.1016/j.chroma.2021.462758. Epub 2021 Dec 18.
4
[Ensemble hologram quantitative structure activity relationship model of the chromatographic retention index of aldehydes and ketones].[醛酮类化合物色谱保留指数的集成全息定量构效关系模型]
Se Pu. 2021 Mar;39(3):331-337. doi: 10.3724/SP.J.1123.2020.06011.
5
Quantitative structure-retention relationships for pyridinium-based ionic liquids used as gas chromatographic stationary phases: convenient software and assessment of reliability of the results.用于气相色谱固定相的吡啶鎓基离子液体的定量结构-保留关系:方便的软件和结果可靠性评估。
J Chromatogr A. 2024 Aug 16;1730:465144. doi: 10.1016/j.chroma.2024.465144. Epub 2024 Jul 6.
6
Performance comparison of nonlinear and linear regression algorithms coupled with different attribute selection methods for quantitative structure - retention relationships modelling in micellar liquid chromatography.胶束液相色谱中非线性和线性回归算法与不同属性选择方法相结合的定量结构 - 保留关系建模的性能比较。
J Chromatogr A. 2020 Jul 19;1623:461146. doi: 10.1016/j.chroma.2020.461146. Epub 2020 Apr 29.
7
[Development progress of stationary phase for supercritical fluid chromatography and related application in natural products].[超临界流体色谱固定相的发展进展及其在天然产物中的相关应用]
Se Pu. 2023 Oct;41(10):866-878. doi: 10.3724/SP.J.1123.2023.07024.
8
Enhancing the Predictive Performance of Molecularly Imprinted Polymer-Based Electrochemical Sensors Using a Stacking Regressor Ensemble of Machine Learning Models.使用机器学习模型的堆叠回归器集成来提高基于分子印迹聚合物的电化学传感器的预测性能。
ACS Sens. 2025 Apr 25;10(4):3123-3133. doi: 10.1021/acssensors.5c00364. Epub 2025 Apr 17.
9
Large-scale statistical study of the dependence of retention index on heating rate in temperature-programmed gas chromatography.大规模统计研究程序升温气相色谱中保留指数与升温速率的依赖关系。
J Chromatogr A. 2024 Sep 13;1732:465223. doi: 10.1016/j.chroma.2024.465223. Epub 2024 Aug 2.
10
Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.我们是否需要不同的机器学习算法来进行定量构效关系建模?对 16 种机器学习算法在 14 个定量构效关系数据集上的综合评估。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa321.

本文引用的文献

1
A simple and reliable QSPR model for prediction of chromatography retention indices of volatile organic compounds in peppers.一种用于预测辣椒中挥发性有机化合物色谱保留指数的简单可靠的定量结构-性质关系(QSPR)模型。
RSC Adv. 2024 Jan 19;14(5):3186-3201. doi: 10.1039/d3ra07960k. eCollection 2024 Jan 17.
2
Use and abuse of retention indices in gas chromatography.保留指数在气相色谱中的应用和滥用。
J Chromatogr A. 2023 Oct 11;1708:464376. doi: 10.1016/j.chroma.2023.464376. Epub 2023 Sep 10.
3
Impact of structural similarity on the accuracy of retention time prediction.
结构相似性对保留时间预测准确性的影响。
J Chromatogr A. 2023 Sep 27;1707:464317. doi: 10.1016/j.chroma.2023.464317. Epub 2023 Aug 19.
4
Accurate prediction of isothermal gas chromatographic Kováts retention indices.准确预测等温气相色谱柯瓦茨保留指数。
J Chromatogr A. 2023 Aug 30;1705:464176. doi: 10.1016/j.chroma.2023.464176. Epub 2023 Jun 24.
5
Computational prediction of Lee retention indices of polycyclic aromatic hydrocarbons by using machine learning.利用机器学习对多环芳烃的Lee保留指数进行计算预测。
Chem Biol Drug Des. 2023 Feb;101(2):380-394. doi: 10.1111/cbdd.14137. Epub 2022 Sep 30.
6
N-Alkylpyridinium sulfonates for retention time indexing in reversed-phase-liquid chromatography-mass spectrometry-based metabolomics.N-烷基吡啶𬭩磺酸盐在反相液相色谱-质谱代谢组学中的保留时间指数。
Anal Bioanal Chem. 2022 Oct;414(25):7387-7398. doi: 10.1007/s00216-021-03828-0. Epub 2021 Dec 15.
7
Predicting Kováts Retention Indices Using Graph Neural Networks.使用图神经网络预测科瓦茨保留指数。
J Chromatogr A. 2021 Jun 7;1646:462100. doi: 10.1016/j.chroma.2021.462100. Epub 2021 Mar 25.
8
Steroid identification via deep learning retention time predictions and two-dimensional gas chromatography-high resolution mass spectrometry.基于深度学习保留时间预测的二维气相色谱-高分辨质谱法进行类固醇鉴定。
J Chromatogr A. 2020 Feb 8;1612:460661. doi: 10.1016/j.chroma.2019.460661. Epub 2019 Oct 28.
9
A deep convolutional neural network for the estimation of gas chromatographic retention indices.一种用于估算气相色谱保留指数的深度卷积神经网络。
J Chromatogr A. 2019 Dec 6;1607:460395. doi: 10.1016/j.chroma.2019.460395. Epub 2019 Jul 29.
10
Prediction Models of Retention Indices for Increased Confidence in Structural Elucidation during Complex Matrix Analysis: Application to Gas Chromatography Coupled with High-Resolution Mass Spectrometry.提高复杂基质分析中结构解析置信度的保留指数预测模型:在气相色谱与高分辨率质谱联用中的应用。
Anal Chem. 2016 Aug 2;88(15):7539-47. doi: 10.1021/acs.analchem.6b00868. Epub 2016 Jul 22.