评估预处理质谱数据对分类性能的影响。

Assessing effects of pre-processing mass spectrometry data on classification performance.

作者信息

Ozcift Akin, Gulten Arif

机构信息

Department of Electrical and Electronics Engineering, Firat University, Turkey.

出版信息

Eur J Mass Spectrom (Chichester). 2008;14(5):267-73. doi: 10.1255/ejms.938.

DOI:10.1255/ejms.938

PMID:19023144

Abstract

Disease prediction through mass spectrometry (MS) data is gaining importance in medical diagnosis. Particularly in cancerous diseases, early prediction is one of the most life saving stages. High dimension and the noisy nature of MS data requires a two-phase study for successful disease prediction; first, MS data must be pre- processed with stages such as baseline correction, normalizing, de-noising and peak detection. Second, a dimension reduction based classifier design is the main objective. Having the data pre-processed, the prediction accuracy of the classifier algorithm becomes the most significant factor in the medical diagnosis phase. As health is the main concern, the accuracy of the classifier is clearly very important. In this study, the effects of the pre- processing stages of MS data on classifier performances are addressed. Three pre-processing stages--baseline correction, normalization and de-noising--are applied to three MS data samples, namely, high-resolution ovarian cancer, low-resolution prostate cancer and a low-resolution ovarian cancer. To measure the effects of the pre-processing stages quantitatively, four diverse classifiers, genetic algorithm wrapped K-nearest neighbor (GA-KNN), principal component analysis-based least discriminant analysis (PCA-LDA), a neural network (NN) and a support vector machine (SVM) are applied to the data sets. Calculated classifier performances have demonstrated the effects of pre-processing stages quantitatively and the importance of pre-processing stages on the prediction accuracy of classifiers. Results of computations have been shown clearly.

摘要

通过质谱（MS）数据进行疾病预测在医学诊断中变得越来越重要。特别是在癌症疾病中，早期预测是最能挽救生命的阶段之一。MS数据的高维度和噪声特性需要进行两阶段研究才能成功进行疾病预测；首先，MS数据必须通过基线校正、归一化、去噪和峰检测等阶段进行预处理。其次，基于降维的分类器设计是主要目标。在对数据进行预处理后，分类器算法的预测准确性成为医学诊断阶段最重要的因素。由于健康是主要关注点，分类器的准确性显然非常重要。在本研究中，探讨了MS数据预处理阶段对分类器性能的影响。对三个MS数据样本，即高分辨率卵巢癌、低分辨率前列腺癌和低分辨率卵巢癌，应用了三个预处理阶段——基线校正、归一化和去噪。为了定量测量预处理阶段的影响，将四种不同的分类器，即遗传算法包裹的K近邻（GA-KNN）、基于主成分分析的最小判别分析（PCA-LDA）、神经网络（NN）和支持向量机（SVM）应用于数据集。计算得到的分类器性能已定量地证明了预处理阶段的影响以及预处理阶段对分类器预测准确性的重要性。计算结果已清晰显示。

相似文献

Assessing effects of pre-processing mass spectrometry data on classification performance.

Eur J Mass Spectrom (Chichester). 2008;14(5):267-73. doi: 10.1255/ejms.938.

Designing a robust feature extraction method based on optimum allocation and principal component analysis for epileptic EEG signal classification.

Comput Methods Programs Biomed. 2015 Apr;119(1):29-42. doi: 10.1016/j.cmpb.2015.01.002. Epub 2015 Jan 30.

Automated pipeline for classifying Aroclors in soil by gas chromatography/mass spectrometry using modulo compressed two-way data objects.

Talanta. 2013 Dec 15;117:483-91. doi: 10.1016/j.talanta.2013.09.050. Epub 2013 Oct 7.

A wavelet-based data pre-processing analysis approach in mass spectrometry.

Comput Biol Med. 2007 Apr;37(4):509-16. doi: 10.1016/j.compbiomed.2006.08.009. Epub 2006 Sep 18.

Channel selection and classification of electroencephalogram signals: an artificial neural network and genetic algorithm-based approach.

Artif Intell Med. 2012 Jun;55(2):117-26. doi: 10.1016/j.artmed.2012.02.001. Epub 2012 Apr 12.

The effect of generalized discriminate analysis (GDA) to the classification of optic nerve disease from VEP signals.

Comput Biol Med. 2008 Jan;38(1):62-8. doi: 10.1016/j.compbiomed.2007.07.002. Epub 2007 Aug 20.

Design of Electronic Nose Detection System for Apple Quality Grading Based on Computational Fluid Dynamics Simulation and K-Nearest Neighbor Support Vector Machine.

Sensors (Basel). 2022 Apr 14;22(8):2997. doi: 10.3390/s22082997.

Automated identification of normal and diabetes heart rate signals using nonlinear measures.

Comput Biol Med. 2013 Oct;43(10):1523-9. doi: 10.1016/j.compbiomed.2013.05.024. Epub 2013 Jun 6.

Machine learning for improved pathological staging of prostate cancer: a performance comparison on a range of classifiers.

Artif Intell Med. 2012 May;55(1):25-35. doi: 10.1016/j.artmed.2011.11.003. Epub 2011 Dec 27.

Classification of electrocardiogram signals with support vector machines and particle swarm optimization.

IEEE Trans Inf Technol Biomed. 2008 Sep;12(5):667-77. doi: 10.1109/TITB.2008.923147.

引用本文的文献

Cumulative learning enables convolutional neural network representations for small mass spectrometry data classification.

Nat Commun. 2020 Nov 5;11(1):5595. doi: 10.1038/s41467-020-19354-z.

SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease.

J Med Syst. 2012 Aug;36(4):2141-7. doi: 10.1007/s10916-011-9678-1. Epub 2011 Mar 10.

A robust multi-class feature selection strategy based on Rotation Forest Ensemble algorithm for diagnosis of Erythemato-Squamous diseases.

J Med Syst. 2012 Apr;36(2):941-9. doi: 10.1007/s10916-010-9558-0. Epub 2010 Jul 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估预处理质谱数据对分类性能的影响。

Assessing effects of pre-processing mass spectrometry data on classification performance.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献