Guo Shuxia, Bocklitz Thomas, Popp Jürgen
Institute of Physical Chemistry and Abbe School of Photonics, Friedrich-Schiller-University, Jena, Helmholtzweg 4, D-07743 Jena, Germany.
Analyst. 2016 Apr 21;141(8):2396-404. doi: 10.1039/c6an00041j. Epub 2016 Feb 24.
In the last decade Raman-spectroscopy has become an invaluable tool for biomedical diagnostics. However, a manual rating of the subtle spectral differences between normal and abnormal disease states is not possible or practical. Thus it is necessary to combine Raman-spectroscopy with chemometrics in order to build statistical models predicting the disease states directly without manual intervention. Within chemometrical analysis a number of corrections have to be applied to receive robust models. Baseline correction is an important step of the pre-processing, which should remove spectral contributions of fluorescence effects and improve the performance and robustness of statistical models. However, it is demanding, time-consuming, and depends on expert knowledge to select an optimal baseline correction method and its parameters every time working with a new dataset. To circumvent this issue we proposed a genetic algorithm based method to automatically optimize the baseline correction. The investigation was carried out in three main steps. Firstly, a numerical quantitative marker was defined to evaluate the baseline estimation quality. Secondly, a genetic algorithm based methodology was established to search the optimal baseline estimation with the defined quantitative marker as evaluation function. Finally, classification models were utilized to benchmark the performance of the optimized baseline. For comparison, model based baseline optimization was carried out applying the same classifiers. It was proven that our method could provide a semi-optimal and stable baseline estimation without any chemical knowledge required or any additional spectral information used.
在过去十年中,拉曼光谱已成为生物医学诊断中一种非常有价值的工具。然而,对正常和异常疾病状态之间细微光谱差异进行人工评级是不可能或不实际的。因此,有必要将拉曼光谱与化学计量学相结合,以便建立直接预测疾病状态的统计模型,而无需人工干预。在化学计量分析中,必须应用一些校正来获得稳健的模型。基线校正是预处理的重要步骤,它应消除荧光效应的光谱贡献,并提高统计模型的性能和稳健性。然而,每次处理新数据集时,选择最佳基线校正方法及其参数都需要专业知识,既费力又耗时。为了解决这个问题,我们提出了一种基于遗传算法的方法来自动优化基线校正。该研究分三个主要步骤进行。首先,定义一个数值定量标记来评估基线估计质量。其次,建立一种基于遗传算法的方法,以定义的定量标记作为评估函数来搜索最佳基线估计。最后,利用分类模型来评估优化后基线的性能。为了进行比较,应用相同的分类器进行基于模型的基线优化。结果证明,我们的方法可以提供半最优且稳定的基线估计,无需任何化学知识或使用任何额外的光谱信息。