Department of Chemical and Life Sciences Engineering, Virginia Commonwealth University, Richmond, Virginia 23284, United States.
Medicines for All Institute, Virginia Commonwealth University, Richmond, Virginia 23284, United States.
J Chem Inf Model. 2024 Jul 8;64(13):5006-5015. doi: 10.1021/acs.jcim.4c00359. Epub 2024 Jun 19.
In this work, a new model with broad utility for quantitative spectroscopy development is reported. A primary objective of this work is to create a novel modeling procedure that may allow for higher automation of the model development process. The fundamental concept is simple yet powerful even for complex spectra and is employed with no additional preprocessing. This approach is applicable for several types of spectroscopic data to develop regression models that have similar or greater quality than the current methods. The key modeling steps are a matrix transformation and subsequent feature selection process that are collectively referred to as iterative regression of corrective baselines (IRCB). The transformed matrix () is a linearized form of the original data set. Features from that are predictive of can be ranked and selected by ordinary least-squares regression. The best features (rows of ) are linear depictions of that can be utilized to develop regression models with several machine learning models. The IRCB workflow is first detailed by using a case study of Fourier transform infrared (FTIR) spectroscopy for prepared solutions of a three-component mixture. Next, IRCB is applied and compared to benchmark results for the 2006 "Chimiométrie" near-infrared spectroscopy (NIR) soil composition challenge and Raman measurements of a simulated nuclear waste slurry.
在这项工作中,报告了一种新的具有广泛应用价值的定量光谱学开发模型。这项工作的主要目标是创建一种新的建模程序,该程序可以实现模型开发过程的更高自动化。其基本概念简单,但即使对于复杂的光谱也非常强大,并且无需额外的预处理即可使用。该方法适用于多种类型的光谱数据,以开发具有与当前方法相似或更高质量的回归模型。关键的建模步骤是矩阵变换和随后的特征选择过程,统称为迭代校正基线回归(IRCB)。变换后的矩阵()是原始数据集的线性化形式。可以通过普通最小二乘回归对可预测 的 中的特征进行排序和选择。最佳特征(的行)是 的线性表示形式,可用于使用几种机器学习模型开发回归模型。首先通过使用三组分混合物的傅里叶变换红外(FTIR)光谱的案例研究详细介绍了 IRCB 工作流程。接下来,将 IRCB 应用于 2006 年“Chimiométrie”近红外(NIR)光谱土壤成分挑战和模拟核废料泥浆的拉曼测量的基准结果,并进行了比较。