Zarei Mahsa, Solomatova Natalia V, Aghaei Hoda, Rothwell Austin, Wiens Jeffrey, Melo Luke, Good Travis G, Shokatian Sadegh, Grant Edward
Department of Chemistry, The University of British Columbia, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada.
Miraterra Technologies Corporation, 199 W 6th Ave, Vancouver, British Columbia V5Y 1K3, Canada.
Anal Chem. 2023 Oct 31;95(43):15908-15916. doi: 10.1021/acs.analchem.3c02348. Epub 2023 Sep 12.
Important decisions in local agricultural policy and practice often hinge on the soil's chemical composition. Raman spectroscopy offers a rapid noninvasive means to quantify the constituents of complex organic systems. But the application of Raman spectroscopy to soils presents a multifaceted challenge due to organic/mineral compositional complexity and spectral interference arising from overwhelming fluorescence. The present work compares methodologies with the capacity to help overcome common obstacles that arise in the analysis of soils. We created conditions representative of these challenges by combining varying proportions of six amino acids commonly found in soils with fluorescent bentonite clay and coarse mineral components. Referring to an extensive data set of Raman spectra, we compare the performance of the convolutional neural network (CNN) and partial least-squares regression (PLSR) multivariate models for amino acid composition. Strategies employing volume-averaged spectral sampling and data preprocessing algorithms improve the predictive power of these models. Our average test for PLSR models exceeds 0.89 and approaches 0.98, depending on the complexity of the matrix, whereas CNN yields an range from 0.91 to 0.97, demonstrating that classic PLSR and CNN perform comparably, except in cases where the signal-to-noise ratio of the organic component is very low, whereupon CNN models outperform. Artificially isolating two of the most prevalent obstacles in evaluating the Raman spectra of soils, we have characterized the effect of each obstacle on the performance of machine learning models in the absence of other complexities. These results highlight important considerations and modeling strategies necessary to improve the Raman analysis of organic compounds in complex mixtures in the presence of mineral spectral components and significant fluorescence.
地方农业政策与实践中的重要决策往往取决于土壤的化学成分。拉曼光谱提供了一种快速的非侵入性方法来量化复杂有机系统的成分。但由于有机/矿物成分的复杂性以及强烈荧光产生的光谱干扰,将拉曼光谱应用于土壤存在多方面的挑战。本研究比较了有助于克服土壤分析中常见障碍的方法。我们通过将土壤中常见的六种氨基酸与荧光膨润土和粗矿物成分按不同比例混合,创造了代表这些挑战的条件。参考大量拉曼光谱数据集,我们比较了卷积神经网络(CNN)和偏最小二乘回归(PLSR)多元模型对氨基酸组成的性能。采用体积平均光谱采样和数据预处理算法的策略提高了这些模型的预测能力。我们的PLSR模型平均测试值超过0.89,根据矩阵的复杂性接近0.98,而CNN的测试值范围为0.91至0.97,表明经典的PLSR和CNN表现相当,除了有机成分信噪比非常低的情况,此时CNN模型表现更优。通过人为分离评估土壤拉曼光谱中两个最普遍的障碍,我们表征了每个障碍在不存在其他复杂性的情况下对机器学习模型性能的影响。这些结果突出了在存在矿物光谱成分和显著荧光的情况下,改善复杂混合物中有机化合物拉曼分析所需的重要考虑因素和建模策略。