Li Boyan, Calvet Amandine, Casamayou-Boucau Yannick, Ryder Alan G
Nanoscale Biophotonics Laboratory, School of Chemistry, National University of Ireland, Galway, Galway, Ireland.
Nanoscale Biophotonics Laboratory, School of Chemistry, National University of Ireland, Galway, Galway, Ireland.
Anal Chim Acta. 2016 Mar 24;913:111-20. doi: 10.1016/j.aca.2016.01.042. Epub 2016 Jan 27.
A new, fully automated, rapid method, referred to as kernel principal component analysis residual diagnosis (KPCARD), is proposed for removing cosmic ray artifacts (CRAs) in Raman spectra, and in particular for large Raman imaging datasets. KPCARD identifies CRAs via a statistical analysis of the residuals obtained at each wavenumber in the spectra. The method utilizes the stochastic nature of CRAs; therefore, the most significant components in principal component analysis (PCA) of large numbers of Raman spectra should not contain any CRAs. The process worked by first implementing kernel PCA (kPCA) on all the Raman mapping data and second accurately estimating the inter- and intra-spectrum noise to generate two threshold values. CRA identification was then achieved by using the threshold values to evaluate the residuals for each spectrum and assess if a CRA was present. CRA correction was achieved by spectral replacement where, the nearest neighbor (NN) spectrum, most spectroscopically similar to the CRA contaminated spectrum and principal components (PCs) obtained by kPCA were both used to generate a robust, best curve fit to the CRA contaminated spectrum. This best fit spectrum then replaced the CRA contaminated spectrum in the dataset. KPCARD efficacy was demonstrated by using simulated data and real Raman spectra collected from solid-state materials. The results showed that KPCARD was fast (<1 min per 8400 spectra), accurate, precise, and suitable for the automated correction of very large (>1 million) Raman datasets.
本文提出了一种全新的、全自动的快速方法,称为核主成分分析残差诊断法(KPCARD),用于去除拉曼光谱中的宇宙射线伪影(CRA),特别是针对大型拉曼成像数据集。KPCARD通过对光谱中每个波数处获得的残差进行统计分析来识别CRA。该方法利用了CRA的随机性;因此,大量拉曼光谱主成分分析(PCA)中最显著的成分不应包含任何CRA。该过程首先对所有拉曼映射数据实施核主成分分析(kPCA),其次准确估计光谱间和光谱内的噪声以生成两个阈值。然后通过使用这些阈值评估每个光谱的残差并判断是否存在CRA来实现CRA识别。通过光谱替换实现CRA校正,即使用与CRA污染光谱在光谱上最相似的最近邻(NN)光谱以及通过kPCA获得的主成分(PC)来生成对CRA污染光谱的稳健、最佳曲线拟合。然后,这个最佳拟合光谱替换数据集中的CRA污染光谱。通过使用模拟数据和从固态材料收集的真实拉曼光谱证明了KPCARD的有效性。结果表明,KPCARD速度快(每8400个光谱<1分钟)、准确、精确,适用于对非常大(>100万)的拉曼数据集进行自动校正。