Coca-Lopez Nicolas
Instituto de Catálisis y Petroleoquímica (ICP), CSIC, Marie Curie, 2, Madrid, 28049, Spain.
Anal Chim Acta. 2024 Mar 22;1295:342312. doi: 10.1016/j.aca.2024.342312. Epub 2024 Feb 2.
Raman spectroscopists are familiar with the challenge of dealing with spikes caused by cosmic rays. These artifacts may lead to errors in subsequent data processing steps, such as for example calibration, normalization or spectral search. Spike removal is therefore a fundamental step in Raman spectral data pre-treatment, but access to publicly accessible code for spike removal tools is limited, and their performance for spectra correction often unknown. Therefore, there is a need for development and testing open-source and easy-to-implement algorithms that improve the Raman data processing workflow.
In this work, we present and validate two approaches for spike detection and correction in Raman spectral data from graphene: i) An algorithm based on the peaks' widths and prominences and ii) an algorithm based on the ratio of these two peak features. The first algorithm provides an efficient and reliable approach for spike detection in real and synthetic Raman spectra by imposing thresholds on the peaks' width and prominence. The second approach leverages the prominence/width ratio for outlier detection. It relies on the calculation of a limit of detection based on either one or several spectra, enabling the automatic identification of cosmic ray and low-intensity noise-originated spikes alike. Both algorithms detect low-intensity spikes down to at least ≈10% of the highest Raman peak of spectra with different noise levels. To address their limitations and prove their versatility, the algorithms were further tested in Raman spectra from calcite and polystyrene.
Our intuitive, open-source algorithms have been validated and allow automatic correction for a given set of samples. They do not require any pre-processing steps such as calibration or baseline subtraction, and their implementation with Python libraries is computationally efficient, allowing for immediate utilization within existing open-source packages for Raman spectra processing.
拉曼光谱学家熟悉处理由宇宙射线引起的尖峰的挑战。这些伪像可能会在后续数据处理步骤中导致错误,例如校准、归一化或光谱搜索。因此,去除尖峰是拉曼光谱数据预处理的基本步骤,但用于尖峰去除工具的公开可用代码有限,并且它们对光谱校正的性能通常未知。因此,需要开发和测试改进拉曼数据处理工作流程的开源且易于实现的算法。
在这项工作中,我们提出并验证了两种用于检测和校正石墨烯拉曼光谱数据中尖峰的方法:i)一种基于峰宽和峰高的算法,以及ii)一种基于这两个峰特征比率的算法。第一种算法通过对峰宽和峰高施加阈值,为真实和合成拉曼光谱中的尖峰检测提供了一种有效且可靠的方法。第二种方法利用峰高/峰宽比率进行异常值检测。它依赖于基于一个或多个光谱计算检测限,能够自动识别宇宙射线和低强度噪声产生的尖峰。两种算法都能检测到低强度尖峰,其强度至少低至不同噪声水平光谱中最高拉曼峰的≈10%。为了解决它们的局限性并证明其通用性,这些算法在方解石和聚苯乙烯的拉曼光谱中进行了进一步测试。
我们直观的开源算法已经过验证,可对给定的一组样品进行自动校正。它们不需要任何预处理步骤,如校准或基线扣除,并且使用Python库实现计算效率高,可在现有的用于拉曼光谱处理的开源包中立即使用。