Deep Medicine, Nuffield Department of Women's and Reproductive Health, Oxford Martin School, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DU, UK.
Research Centre for Biomedical Engineering (RCBE), School of Mathematics, Computer Science and Engineering, City, University of London, Northampton Square, London, EC1V 0HB, UK.
Sci Rep. 2021 Jul 2;11(1):13734. doi: 10.1038/s41598-021-92850-4.
The linear relationship between optical absorbance and the concentration of analytes-as postulated by the Beer-Lambert law-is one of the fundamental assumptions that much of the optical spectroscopy literature is explicitly or implicitly based upon. The common use of linear regression models such as principal component regression and partial least squares exemplifies how the linearity assumption is upheld in practical applications. However, the literature also establishes that deviations from the Beer-Lambert law can be expected when (a) the light source is far from monochromatic, (b) the concentrations of analytes are very high and (c) the medium is highly scattering. The lack of a quantitative understanding of when such nonlinearities can become predominant, along with the mainstream use of nonlinear machine learning models in different fields, have given rise to the use of methods such as random forests, support vector regression, and neural networks in spectroscopic applications. This raises the question that, given the small number of samples and the high number of variables in many spectroscopic datasets, are nonlinear effects significant enough to justify the additional model complexity? In the present study, we empirically investigate this question in relation to lactate, an important biomarker. Particularly, to analyze the effects of scattering matrices, three datasets were generated by varying the concentration of lactate in phosphate buffer solution, human serum, and sheep blood. Additionally, the fourth dataset pertained to invivo, transcutaneous spectra obtained from healthy volunteers in an exercise study. Linear and nonlinear models were fitted to each dataset and measures of model performance were compared to attest the assumption of linearity. To isolate the effects of high concentrations, the phosphate buffer solution dataset was augmented with six samples with very high concentrations of lactate between (100-600 mmol/L). Subsequently, three partly overlapping datasets were extracted with lactate concentrations varying between 0-11, 0-20 and 0-600 mmol/L. Similarly, the performance of linear and nonlinear models were compared in each dataset. This analysis did not provide any evidence of substantial nonlinearities due high concentrations. However, the results suggest that nonlinearities may be present in scattering media, justifying the use of complex, nonlinear models.
光吸收与分析物浓度之间的线性关系——正如比尔-朗伯定律所假设的那样——是光学光谱学文献的基本假设之一,这些文献要么明确、要么隐含地基于这一假设。主成分回归和偏最小二乘法等线性回归模型的广泛应用,例证了线性假设在实际应用中是如何得到维持的。然而,文献也表明,当(a)光源远非单色,(b)分析物浓度非常高,以及(c)介质高度散射时,可以预期会偏离比尔-朗伯定律。由于缺乏对非线性何时会成为主导因素的定量理解,再加上不同领域中主流使用非线性机器学习模型,这导致了在光谱学应用中使用随机森林、支持向量回归和神经网络等方法。这就提出了一个问题,即在许多光谱数据集样本数量较少且变量数量较多的情况下,非线性效应是否显著到足以证明增加模型复杂性是合理的?在本研究中,我们针对乳酸这一重要生物标志物,从经验上探讨了这个问题。特别是,为了分析散射矩阵的影响,我们通过改变磷酸盐缓冲溶液、人血清和绵羊血液中乳酸的浓度,生成了三个数据集。此外,第四个数据集涉及到来自运动研究中健康志愿者的体内、经皮光谱。我们对每个数据集拟合了线性和非线性模型,并比较了模型性能的度量,以证明线性假设的合理性。为了隔离高浓度的影响,我们在磷酸盐缓冲溶液数据集中增加了六个浓度非常高(100-600mmol/L)的乳酸样本。随后,我们提取了三个部分重叠的数据集,其中乳酸浓度分别在 0-11、0-20 和 0-600mmol/L 之间变化。同样,我们在每个数据集比较了线性和非线性模型的性能。该分析没有提供任何由于高浓度而导致实质性非线性的证据。然而,结果表明,非线性可能存在于散射介质中,这证明了使用复杂的非线性模型是合理的。