利用口腔组织荧光光谱的光学病理学：通过主成分分析和k均值最近邻分析进行分类

Optical pathology using oral tissue fluorescence spectra: classification by principal component analysis and k-means nearest neighbor analysis.

作者信息

Kamath Sudha D, Mahato K K

机构信息

Center for Laser Spectroscopy, KMC Life Sciences Center, Manipal Academy of Higher Education, Manipal 576 104, India.

出版信息

J Biomed Opt. 2007 Jan-Feb;12(1):014028. doi: 10.1117/1.2437738.

DOI:10.1117/1.2437738

PMID:17343503

Abstract

The spectral analysis and classification for discrimination of pulsed laser-induced autofluorescence spectra of pathologically certified normal, premalignant, and malignant oral tissues recorded at a 325-nm excitation are carried out using MATLAB@R6-based principal component analysis (PCA) and k-means nearest neighbor (k-NN) analysis separately on the same set of spectral data. Six features such as mean, median, maximum intensity, energy, spectral residuals, and standard deviation are extracted from each spectrum of the 60 training samples (spectra) belonging to the normal, premalignant, and malignant groups and they are used to perform PCA on the reference database. Standard calibration models of normal, premalignant, and malignant samples are made using cluster analysis. We show that a feature vector of length 6 could be reduced to three components using the PCA technique. After performing PCA on the feature space, the first three principal component (PC) scores, which contain all the diagnostic information, are retained and the remaining scores containing only noise are discarded. The new feature space is thus constructed using three PC scores only and is used as input database for the k-NN classification. Using this transformed feature space, the centroids for normal, premalignant, and malignant samples are computed and the efficient classification for different classes of oral samples is achieved. A performance evaluation of k-NN classification results is made by calculating the statistical parameters specificity, sensitivity, and accuracy and they are found to be 100, 94.5, and 96.17%, respectively.

摘要

使用基于MATLAB@R6的主成分分析（PCA）和k均值最近邻（k-NN）分析，分别对在325nm激发波长下记录的经病理证实的正常、癌前和恶性口腔组织的脉冲激光诱导自体荧光光谱进行光谱分析和分类，分析基于同一组光谱数据。从属于正常、癌前和恶性组的60个训练样本（光谱）的每个光谱中提取均值、中位数、最大强度、能量、光谱残差和标准差等六个特征，并将它们用于参考数据库的主成分分析。使用聚类分析建立正常、癌前和恶性样本的标准校准模型。我们表明，使用主成分分析技术，长度为6的特征向量可以减少为三个分量。在特征空间上执行主成分分析后，保留包含所有诊断信息的前三个主成分（PC）得分，而丢弃仅包含噪声的其余得分。因此，仅使用三个PC得分构建新的特征空间，并将其用作k-NN分类的输入数据库。使用此变换后的特征空间，计算正常、癌前和恶性样本的质心，并实现对不同类口腔样本的有效分类。通过计算统计参数特异性、敏感性和准确性对k-NN分类结果进行性能评估，发现它们分别为100%、94.5%和96.17%。