Li Qingbo, Wang Jianwen, Zhou Yan
School of Instrumentation and Optoelectronic Engineering, Precision Opto-Mechatronics Technology Key Laboratory of Education Ministry, Beihang University, Beijing 100191, China.
Department of Neurosurgery, PLA Air Force Medical Center, Beijing 100142, China.
Anal Methods. 2023 Apr 13;15(15):1861-1869. doi: 10.1039/d3ay00188a.
Glioma is an intracranial malignant brain tumor with high infiltration. It is difficult to identify the glioma boundary. Raman spectroscopy can potentially detect this boundary accurately and during surgery. However, when building a classification model for an experiment, fresh normal tissue is difficult to obtain. The number of normal tissues is far less than that of glioma tissues, which leads to a classification bias toward the majority class. In this study, a data augmentation algorithm GKIM based on the Gaussian kernel density is proposed for the data augmentation of normal tissue spectra. A weight coefficient calculation formula is proposed based on the Gaussian density instead of a fixed coefficient to synthesize new spectra, which increases sample diversity and improves the robustness of modeling. Additionally, the fuzzy nearest neighbor distance replaces the general fixed neighbor number to select the original spectra for synthesis. It automatically determines the nearest spectra and adaptively synthesizes new spectra according to the characteristics of the input spectra. It effectively overcomes the problem of the newly generated sample distribution being too concentrated in specific spaces for the common data augmentation method. In this study, 769 Raman spectra of glioma and 136 Raman spectra of normal brain tissue corresponding to 205 and 37 cases, respectively, were collected. The Raman spectra of the normal tissue were extended to 600. The accuracy, sensitivity, and specificity were 91.67%, 91.67%, and 91.67%. The proposed method achieved better predictive performance than traditional algorithms for class imbalance.
胶质瘤是一种具有高浸润性的颅内恶性脑肿瘤。识别胶质瘤边界很困难。拉曼光谱有可能在手术过程中准确检测出这个边界。然而,在构建实验分类模型时,新鲜正常组织很难获得。正常组织的数量远少于胶质瘤组织,这导致分类偏向于多数类。在本研究中,提出了一种基于高斯核密度的数据增强算法GKIM,用于正常组织光谱的数据增强。提出了基于高斯密度的权重系数计算公式来合成新光谱,而不是使用固定系数,这增加了样本多样性并提高了建模的鲁棒性。此外,模糊最近邻距离取代了一般的固定邻居数来选择原始光谱进行合成。它根据输入光谱的特征自动确定最近的光谱并自适应地合成新光谱。它有效地克服了常见数据增强方法中生成的新样本分布过于集中在特定空间的问题。在本研究中,分别收集了对应于205例和37例的769个胶质瘤拉曼光谱和136个正常脑组织拉曼光谱。正常组织的拉曼光谱扩展到600个。准确率、灵敏度和特异性分别为91.67%、91.67%和91.67%。所提出的方法在类别不平衡问题上比传统算法具有更好的预测性能。