Key Laboratory of Photoelectronic Imaging Technology and System of Ministry of Education of China, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China.
Anal Sci. 2024 Dec;40(12):2101-2109. doi: 10.1007/s44211-024-00645-0. Epub 2024 Aug 29.
One key aspect pushing the frontiers of biomedical RS is dedicated machine- or deep- learning (ML or DL) algorithms. Yet, systematic comparative study between ML and DL algorithms has not been conducted for biomedical RS, largely due to the limited availability of open-source and large Raman spectra dataset. Therefore we compared typical ML partial least square-discriminant analysis (PLS-DA) and DL one dimensional convolution neural network (1D-CNN) based pathogenic microbe identification on 12,000 Raman spectra from six species of microbe (i.e., K. aerogenes (Klebsiella aerogenes), C. albicans (Candida albicans), C. glabrata (Candida glabrata), Group A Strep. (Group A Streptococcus), E. coli1 (Escherichia coli1), E. coli2 (Escherichia coli2)) when 100%, 75%, 50% and 25% of the 12,000 Raman spectra were retained. The total Raman dataset was analyzed with 80% split for training and 20% for testing. The 100% retained testing dataset accuracy, area under curve (AUC) of the receiver operating characteristic (ROC) curve were 95.25% and 0.997 for 1D-CNN, which are higher than those (89.42% and 0.979) of PLS-DA. Yet, PLS-DA outperforms 1D-CNN for 75%, 50% and 25% retained testing dataset. The resultant accuracies and AUCs demonstrated the performance reliance of PLS-DA and 1D-CNN on Raman spectra number. Besides, both loadings on the latent variables of PLS-DA and the saliency maps of 1D-CNN largely captured Raman peaks arising from DNA and proteins with comparable interpretability. The results of the current work indicated that both ML and DL algorithms should be explored for application-wise Raman spectra identification to select whichever with higher accuracies and AUCs.
推动生物医学 RS 前沿的一个关键方面是专用的机器或深度学习 (ML 或 DL) 算法。然而,由于缺乏开源和大型拉曼光谱数据集,尚未对 ML 和 DL 算法进行系统的比较研究。因此,我们比较了基于 ML 偏最小二乘判别分析 (PLS-DA) 和 DL 一维卷积神经网络 (1D-CNN) 的致病菌识别,使用了来自六种微生物(即 Aerogenes 菌(Klebsiella aerogenes)、白念珠菌(Candida albicans)、光滑念珠菌(Candida glabrata)、A 组链球菌(Group A Streptococcus)、大肠杆菌 1(Escherichia coli1)和大肠杆菌 2(Escherichia coli2))的 12000 个拉曼光谱,当保留 12000 个拉曼光谱的 100%、75%、50%和 25%时。总拉曼数据集的分析采用 80%的分割用于训练和 20%用于测试。100%保留的测试数据集的准确率、接收者操作特征 (ROC) 曲线下的面积 (AUC) 为 1D-CNN 的 95.25%和 0.997,高于 PLS-DA 的 89.42%和 0.979。然而,PLS-DA 在 75%、50%和 25%保留的测试数据集上的表现优于 1D-CNN。结果表明,PLS-DA 和 1D-CNN 的性能依赖于拉曼光谱的数量。此外,PLS-DA 的潜在变量上的载荷和 1D-CNN 的显着性图很大程度上捕获了来自 DNA 和蛋白质的拉曼峰,具有相当的可解释性。当前工作的结果表明,应根据应用程序探索 ML 和 DL 算法来选择具有更高准确率和 AUC 的算法。