Lim Zhen Jia, Žuvela Petar, Ukić Šime, Novak Stankov Mirjana, Bolanča Tomislav, Lovrić Mario, Wong Ming Wah, Buszewski Bogusław
Department of Chemistry, National University of Singapore, 3 Science Drive 3, Singapore 117543, Singapore.
Department of Analytical Chemistry, Faculty of Chemical Engineering and Technology, University of Zagreb, Marulićev trg 19, Zagreb 10000, Croatia.
ACS Omega. 2025 Feb 4;10(6):5993-6002. doi: 10.1021/acsomega.4c09868. eCollection 2025 Feb 18.
Quantitative structure-retention relationships (QSRRs) have been a popular modeling approach in ion chromatography to predict retention time from molecular structures. It is often coupled with solvent strength models to extend it to other isocratic chromatographic conditions. While this approach has achieved reasonable success, potential inconsistencies from the solvent strength model may propagate to the QSRR models, thereby amplifying their errors. In this work, we aim to incorporate information on the isocratic conditions directly into the QSRR model to reduce error propagation and build global models. Four machine learning approaches that can account for both global and local sources of variability in chromatographic retention, random forest regression, gradient boosting regression (GBR), extreme gradient boosting (xgBoost), and adaptive boosting (AdaBoost), were evaluated and compared. The partial least-squares model was built as a baseline to compare against. GBR and xgBoost have shown superior predictive ability among the evaluated models with root-mean-square errors (RMSEs) of isocratic retention of 0.025 (+0.009, -0.006) and 0.025 (+0.008, -0.006), respectively. Developed QSRR models were further incorporated into the isocratic-to-gradient model to predict gradient retention. GBR and xgBoost QSRR models have outperformed the other models with RMSEs of gradient retention of 0.358 (+0.199, -0.107) and 0.385 (+0.387, -0.139) min, respectively. Such an approach demonstrates the benefits of incorporating the eluent composition into prediction models, with the potential to extend to other chromatographic techniques.
定量结构保留关系(QSRRs)一直是离子色谱中一种流行的建模方法,用于从分子结构预测保留时间。它通常与溶剂强度模型相结合,以将其扩展到其他等度色谱条件。虽然这种方法取得了一定的成功,但溶剂强度模型潜在的不一致性可能会传播到QSRR模型中,从而放大其误差。在这项工作中,我们旨在将等度条件的信息直接纳入QSRR模型,以减少误差传播并建立全局模型。评估并比较了四种能够考虑色谱保留中全局和局部变异性来源的机器学习方法,即随机森林回归、梯度提升回归(GBR)、极端梯度提升(xgBoost)和自适应提升(AdaBoost)。构建了偏最小二乘模型作为基线进行比较。在评估的模型中,GBR和xgBoost表现出卓越的预测能力,等度保留的均方根误差(RMSEs)分别为0.025(+0.009,-0.006)和0.025(+0.008,-0.006)。开发的QSRR模型进一步纳入等度-梯度模型以预测梯度保留。GBR和xgBoost QSRR模型分别以0.358(+0.199,-0.107)和0.385(+0.387,-0.139)分钟的梯度保留RMSEs优于其他模型。这种方法展示了将洗脱液组成纳入预测模型的好处,有可能扩展到其他色谱技术。