Zhao Weixiang, Davis Cristina E
Department of Mechanical and Aeronautical Engineering, One Shields Avenue, University of California, Davis, CA 95616, United States.
Anal Chim Acta. 2009 Sep 28;651(1):15-23. doi: 10.1016/j.aca.2009.08.008. Epub 2009 Aug 15.
This paper introduces the ant colony algorithm, a novel swarm intelligence based optimization method, to select appropriate wavelet coefficients from mass spectral data as a new feature selection method for ovarian cancer diagnostics. By determining the proper parameters for the ant colony algorithm (ACA) based searching algorithm, we perform the feature searching process for 100 times with the number of selected features fixed at 5. The results of this study show: (1) the classification accuracy based on the five selected wavelet coefficients can reach up to 100% for all the training, validating and independent testing sets; (2) the eight most popular selected wavelet coefficients of the 100 runs can provide 100% accuracy for the training set, 100% accuracy for the validating set, and 98.8% accuracy for the independent testing set, which suggests the robustness and accuracy of the proposed feature selection method; and (3) the mass spectral data corresponding to the eight popular wavelet coefficients can be located by reverse wavelet transformation and these located mass spectral data still maintain high classification accuracies (100% for the training set, 97.6% for the validating set, and 98.8% for the testing set) and also provide sufficient physical and medical meaning for future ovarian cancer mechanism studies. Furthermore, the corresponding mass spectral data (potential biomarkers) are in good agreement with other studies which have used the same sample set. Together these results suggest this feature extraction strategy will benefit the development of intelligent and real-time spectroscopy instrumentation based diagnosis and monitoring systems.
本文介绍了蚁群算法,一种基于群体智能的新型优化方法,用于从质谱数据中选择合适的小波系数,作为卵巢癌诊断的一种新的特征选择方法。通过为基于蚁群算法(ACA)的搜索算法确定合适的参数,我们在选定特征数量固定为5的情况下进行了100次特征搜索过程。本研究结果表明:(1)基于所选五个小波系数的分类准确率在所有训练集、验证集和独立测试集上均可达到100%;(2)100次运行中最常被选中的八个小波系数在训练集上的准确率为100%,验证集上为100%,独立测试集上为98.8%,这表明所提出的特征选择方法具有鲁棒性和准确性;(3)通过小波逆变换可以定位与这八个常用小波系数对应的质谱数据,这些定位后的质谱数据仍保持较高的分类准确率(训练集为100%,验证集为97.6%,测试集为98.8%),并且为未来卵巢癌机制研究提供了充分的物理和医学意义。此外,相应的质谱数据(潜在生物标志物)与其他使用相同样本集的研究结果高度一致。这些结果共同表明,这种特征提取策略将有助于基于智能实时光谱仪器的诊断和监测系统的发展。