Zhang Linna, Ding Hongyan, Wang Yimin, Guo Xin, Li Hong
Faculty of Mechanical & Material Engineering, Huaiyin Institute of Technology, Huai'an 223003, China.
Faculty of Mechanical & Material Engineering, Huaiyin Institute of Technology, Huai'an 223003, China.
Spectrochim Acta A Mol Biomol Spectrosc. 2020 Feb 15;227:117750. doi: 10.1016/j.saa.2019.117750. Epub 2019 Nov 5.
Near infrared spectroscopy is widely used in composition analysis in fields of food, medicines, environment, and so on. The proportion of sample size and the wavelength used is very important for the performance of the calibration model. In this research, we explored the influence of ratio of sample size to the number of wavelength (SWR) on the performance of calibration model, with hemoglobin determination as an example. The results showed that RMSEC increases with the increase of SWR, when SWR is less than 0.5, namely the samples in the calibration set were less than half of the number of wavelengths used in establishing the calibration model, while RMSEP decreases with the increase of SWR. The calibration model was lack of reliability at this range for SWR. RMSEC and RMSEP tend to be stable when SWR value is greater than 0.9. However, in most cases, the samples size was limited, and wavelength selection was commonly used in practical spectroscopy analysis. In order to confirm that the effect of SWR were caused by both sample size and wavelength number, we also studied the performance of calibration model with different WSR. Wavelengths were selected by equidistant combination multiple linear regression (ECMLR) method. The conclusion from results were consistent with the previous part, namely when establishing calibration model, the number of wavelengths used should be less than the twice amount of samples in the calibration set to ensure the validity of the model. We recommend that wavelength selection part was indispensable for small sample size cases. This research can be important evidence and guide for other researches with spectroscopy methods.
近红外光谱技术在食品、药品、环境等领域的成分分析中得到了广泛应用。样本大小与所用波长的比例对于校准模型的性能非常重要。在本研究中,我们以血红蛋白测定为例,探讨了样本大小与波长数量之比(SWR)对校准模型性能的影响。结果表明,当SWR小于0.5时,即校准集中的样本数量少于建立校准模型所用波长数量的一半时,RMSEC随SWR的增加而增加,而RMSEP随SWR的增加而减小。在此SWR范围内,校准模型缺乏可靠性。当SWR值大于0.9时,RMSEC和RMSEP趋于稳定。然而,在大多数情况下,样本大小是有限的,并且在实际光谱分析中通常会进行波长选择。为了确认SWR的影响是由样本大小和波长数量共同引起的,我们还研究了不同WSR下校准模型的性能。通过等距组合多元线性回归(ECMLR)方法选择波长。结果得出的结论与前一部分一致,即在建立校准模型时,所用波长数量应小于校准集中样本数量的两倍,以确保模型的有效性。我们建议对于小样本量的情况,波长选择部分是必不可少的。本研究可为其他光谱学方法的研究提供重要依据和指导。