Xu Jiwei, Xu Jianjie, Tong Zhaoyang, Yu Siqi, Liu Bing, Mu Xihui, Du Bin, Gao Chuan, Wang Jiang, Liu Zhiwei, Liu Dong
State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China.
State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China.
Spectrochim Acta A Mol Biomol Spectrosc. 2023 Aug 5;296:122646. doi: 10.1016/j.saa.2023.122646. Epub 2023 Mar 16.
Biological agents are important to detect and identify with respect to environmental contamination and public health. Noise contamination in fluorescent spectra is one of the contributors to the uncertainties of identification. In order to investigate the noise-tolerant capability provided by laboratory-measured excitation-emission matrix (EEM) fluorescence spectra that are used as a database, fluorescence properties of four proteinaceous biotoxin samples and ten harmless protein samples were characterized by EEM fluorescence spectra, and the predicting performance of models trained by laboratory-measured fluorescence data was tested and verified from validation data with noise-contaminated spectra. By means of peak signal of noise (PSNR) as an indicator of noise levels, the potential impact of noise contaminations on the characterization and discrimination of these samples was evaluated quantitatively. Different classification schemes utilizing multivariate analysis techniques of Principal Component Analysis (PCA), Random Forest (RF), and Multi-layer Perceptron (MPL) coupled with feature descriptors of differential transform (DT), Fourier transform (FT) and wavelet transform (WT) were conducted under different PSNR values. We systematically analyzed the performance of classification schemes by the case study at 20 PSNR and by statistical analysis from 1-100 PSNR. The results show that the spectral features with EEM-WT decreased the demanding number of input variables while retaining high performances in sample classification. The spectral features with EEM-FT presented the worst performance although having the largest number of features. The distributions of feature importance and contribution were found sensitive to noise contaminations. The classification scheme of PCA prior to MPL with EEM-WT as input presented an improvement in lower PSNR. These results indicate that robust features extracted by corresponding techniques are critical to enhancing the spectral differentiation capabilities among these samples and play an important role in eliminating the noise effect. The study of classification schemes for discriminating protein samples with noise-contaminated spectra presents tremendous potential for future developments in the rapid detection and identification of proteinaceous biotoxins based on three-dimensional fluorescence spectrometry.
就环境污染和公共卫生而言,生物制剂的检测和识别非常重要。荧光光谱中的噪声污染是造成识别不确定性的因素之一。为了研究用作数据库的实验室测量激发 - 发射矩阵(EEM)荧光光谱所提供的抗噪声能力,通过EEM荧光光谱对四种蛋白质生物毒素样品和十种无害蛋白质样品的荧光特性进行了表征,并从具有噪声污染光谱的验证数据中测试和验证了由实验室测量荧光数据训练的模型的预测性能。以噪声峰值信号(PSNR)作为噪声水平的指标,定量评估了噪声污染对这些样品表征和区分的潜在影响。在不同的PSNR值下,采用主成分分析(PCA)、随机森林(RF)和多层感知器(MPL)等多元分析技术结合微分变换(DT)、傅里叶变换(FT)和小波变换(WT)等特征描述符进行了不同的分类方案。我们通过20 PSNR时的案例研究和1 - 100 PSNR的统计分析系统地分析了分类方案的性能。结果表明,EEM - WT的光谱特征在减少输入变量数量的同时,在样品分类中保持了高性能。EEM - FT的光谱特征虽然具有最多的特征,但表现最差。发现特征重要性和贡献的分布对噪声污染敏感。以EEM - WT为输入的MPL之前的PCA分类方案在较低PSNR下有改进。这些结果表明,通过相应技术提取的稳健特征对于增强这些样品之间的光谱区分能力至关重要,并且在消除噪声影响方面发挥着重要作用。对具有噪声污染光谱的蛋白质样品进行分类方案的研究为基于三维荧光光谱法的蛋白质生物毒素的快速检测和识别的未来发展提供了巨大潜力。