Yousuff Mohamed, Babu Rajasekhara
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore Campus, Vellore, 632014 Tamilnadu India.
Earth Sci Inform. 2023;16(1):825-844. doi: 10.1007/s12145-022-00917-1. Epub 2022 Dec 23.
Spectroscopy is a methodology for gaining knowledge of particles, especially biomolecules, by quantifying the interactions between matter and light. By examining the level of light absorbed, reflected or released by a specimen, its constituents, properties, and volume can be determined. Spectra obtained through spectroscopy procedures are quick, harmless and contactless; hence nowadays preferred in chemometrics. Due to the high dimensional nature of the spectra, it is challenging to build a robust classifier with good performance metrics. Many linear and nonlinear dimensionality reduction-based classification models have been previously implemented to overcome this issue. However, they lack in capturing the subtle details of the spectra into the low dimension space or cannot efficiently handle the nonlinearity present in the spectral data. We propose a graph-based neural network embedding approach to extract appropriate features into latent space and circumvent the spectrums' nonlinearity problem. Our approach performs dimensionality reduction into two phases: constructing a nearest neighbor graph and producing almost linear embedding using a fully connected neural network. Further, the low dimensional embedding is subjected to classification using the Random Forest algorithm. In this paper, we have implemented and compared our technique with four nonlinear dimensionality techniques widely used for spectral data analysis. In this study, we have considered five different spectral datasets belonging to specific applications. The various classification performance metrics of all the techniques are evaluated. The proposed approach is able to perform competitively well on six different low-dimensional spaces for each dataset with an accuracy score above 95% and Matthew's correlation coefficient value close to 1. The trustworthiness score of almost 1 show that the presented dimensionality reduction approach preserves the closest neighbor structure of high dimensional spectral inputs into latent space.
光谱学是一种通过量化物质与光之间的相互作用来了解粒子,尤其是生物分子的方法。通过检查样本吸收、反射或释放的光的水平,可以确定其成分、性质和体积。通过光谱学程序获得的光谱快速、无害且非接触式;因此,如今在化学计量学中备受青睐。由于光谱具有高维特性,构建一个具有良好性能指标的强大分类器具有挑战性。此前已经实施了许多基于线性和非线性降维的分类模型来解决这个问题。然而,它们在将光谱的细微细节捕捉到低维空间方面存在不足,或者无法有效处理光谱数据中存在的非线性。我们提出了一种基于图的神经网络嵌入方法,以将适当的特征提取到潜在空间中,并规避光谱的非线性问题。我们的方法分两个阶段进行降维:构建最近邻图并使用全连接神经网络生成近似线性嵌入。此外,使用随机森林算法对低维嵌入进行分类。在本文中,我们已经实现了我们的技术,并将其与广泛用于光谱数据分析的四种非线性降维技术进行了比较。在本研究中,我们考虑了属于特定应用的五个不同光谱数据集。评估了所有技术的各种分类性能指标。所提出的方法能够在每个数据集的六个不同低维空间上具有竞争力地良好运行,准确率得分高于95%,马修斯相关系数值接近1。可信度得分接近1表明所提出的降维方法将高维光谱输入的最邻近结构保留到潜在空间中。