Lu Songlin, Huang Yuanfang, Shen Wan Xiang, Cao Yu Lin, Cai Mengna, Chen Yan, Tan Ying, Jiang Yu Yang, Chen Yu Zong
The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, 2279 Lishui Road, Nanshan District, Shenzhen 518055, Guangdong, P. R. China.
Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, 9 Kexue Avenue, Guangming District, Shenzhen 518132, Guangdong, P. R. China.
PNAS Nexus. 2024 Jul 3;3(8):pgae268. doi: 10.1093/pnasnexus/pgae268. eCollection 2024 Aug.
Feature representation is critical for data learning, particularly in learning spectroscopic data. Machine learning (ML) and deep learning (DL) models learn Raman spectra for rapid, nondestructive, and label-free cell phenotype identification, which facilitate diagnostic, therapeutic, forensic, and microbiological applications. But these are challenged by high-dimensional, unordered, and low-sample spectroscopic data. Here, we introduced novel 2D image-like dual signal and component aggregated representations by restructuring Raman spectra and principal components, which enables spectroscopic DL for enhanced cell phenotype and signature identification. New ConvNet models DSCARNets significantly outperformed the state-of-the-art (SOTA) ML and DL models on six benchmark datasets, mostly with >2% improvement over the SOTA performance of 85-97% accuracies. DSCARNets also performed well on four additional datasets against SOTA models of extremely high performances (>98%) and two datasets without a published supervised phenotype classification model. Explainable DSCARNets identified Raman signatures consistent with experimental indications.
特征表示对于数据学习至关重要,特别是在学习光谱数据时。机器学习(ML)和深度学习(DL)模型学习拉曼光谱以进行快速、无损且无标记的细胞表型识别,这有助于诊断、治疗、法医和微生物学应用。但这些模型受到高维、无序和低样本光谱数据的挑战。在此,我们通过重组拉曼光谱和主成分引入了新颖的二维图像状双信号和成分聚合表示,这使得光谱深度学习能够增强细胞表型和特征识别。新的卷积神经网络模型DSCARNets在六个基准数据集上显著优于当前最先进的(SOTA)ML和DL模型,大多数情况下比准确率为85 - 97%的SOTA性能提高了2%以上。DSCARNets在另外四个数据集上与极高性能(>98%)的SOTA模型以及两个没有已发表的监督表型分类模型的数据集相比也表现良好。可解释的DSCARNets识别出与实验指征一致的拉曼特征。