基于模糊的数据转换进行特征提取，以提高小医学数据集的分类性能。

A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

机构信息

Department of Industrial and Information Management, National Cheng Kung University, 1, University Road, Tainan 70101, Taiwan.

出版信息

Artif Intell Med. 2011 May;52(1):45-52. doi: 10.1016/j.artmed.2011.02.001. Epub 2011 Apr 13.

DOI:10.1016/j.artmed.2011.02.001

PMID:21493051

Abstract

OBJECTIVE

Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small.

METHODS

This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper.

RESULTS

This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance.

CONCLUSION

This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches.

摘要

目的

医学数据集通常规模较小且具有非常高的维度。过多的属性会降低分析效率，且不一定会提高准确性，而过少的数据则会降低建模稳定性。因此，本研究的主要目的是提取最佳特征子集，以在数据集较小时提高分析性能。

方法

本文提出了一种基于模糊的非线性变换方法，用于从原始数据属性值扩展与分类相关的信息，以适用于小规模数据集。基于新的变换数据集，本研究应用主成分分析（PCA）提取最佳特征子集。最后，我们使用这些最佳特征的变换数据作为学习工具（支持向量机 SVM）的输入数据。我们使用了六个医学数据集：皮马印第安人糖尿病、威斯康星州乳腺癌诊断、帕金森病、超声心动图、BUPA 肝疾病数据集和台湾膀胱癌病例，以说明本文提出的方法。

结果

本研究使用 t 检验评估单个数据集的分类准确性；并使用 Friedman 检验表明，与其他方法相比，该方法在多个数据集上具有更好的性能。实验结果表明，当数据集较小时，与 PCA 或核主成分分析（KPCA）相比，所提出的方法具有更好的分类性能，并且建议创建新的与目的相关的信息来提高分析性能。

结论

本文表明，特征提取作为特征选择的一个函数，对于高效数据分析非常重要。当数据集较小时，使用本文提出的基于模糊的变换方法来增加可用信息，比 PCA 和 KPCA 方法产生更好的结果。

相似文献

A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

Artif Intell Med. 2011 May;52(1):45-52. doi: 10.1016/j.artmed.2011.02.001. Epub 2011 Apr 13.

Medical data mining by fuzzy modeling with selected features.

Artif Intell Med. 2008 Jul;43(3):195-206. doi: 10.1016/j.artmed.2008.04.004. Epub 2008 Jun 5.

Fuzzy wavelet packet based feature extraction method and its application to biomedical signal classification.

IEEE Trans Biomed Eng. 2005 Jun;52(6):1132-9. doi: 10.1109/TBME.2005.848377.

An interpretable fuzzy rule-based classification methodology for medical diagnosis.

Artif Intell Med. 2009 Sep;47(1):25-41. doi: 10.1016/j.artmed.2009.05.003. Epub 2009 Jun 18.

Classification for high-throughput data with an optimal subset of principal components.

Comput Biol Chem. 2009 Oct;33(5):408-13. doi: 10.1016/j.compbiolchem.2009.07.017. Epub 2009 Aug 18.

A threshold fuzzy entropy based feature selection for medical database classification.

Comput Biol Med. 2013 Dec;43(12):2222-9. doi: 10.1016/j.compbiomed.2013.10.016. Epub 2013 Oct 25.

Classification of jet fuels by fuzzy rule-building expert systems applied to three-way data by fast gas chromatography--fast scanning quadrupole ion trap mass spectrometry.

Talanta. 2011 Jan 30;83(4):1260-8. doi: 10.1016/j.talanta.2010.05.063. Epub 2010 Jun 8.

The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification.

Comput Methods Programs Biomed. 2008 Jun;90(3):275-84. doi: 10.1016/j.cmpb.2008.01.003. Epub 2008 Mar 4.

Comput Biol Med. 2007 Aug;37(8):1133-40. doi: 10.1016/j.compbiomed.2006.10.005. Epub 2006 Nov 28.

An expert system based on principal component analysis, artificial immune system and fuzzy k-NN for diagnosis of valvular heart diseases.

Comput Biol Med. 2008 Mar;38(3):329-38. doi: 10.1016/j.compbiomed.2007.11.004. Epub 2008 Jan 4.

引用本文的文献

Exploring unsupervised feature extraction algorithms: tackling high dimensionality in small datasets.

Sci Rep. 2025 Jul 1;15(1):21973. doi: 10.1038/s41598-025-07725-9.

PD-DETECTOR: A sustainable and computationally intelligent mobile application model for Parkinson's disease severity assessment.

Heliyon. 2024 Jul 15;10(14):e34593. doi: 10.1016/j.heliyon.2024.e34593. eCollection 2024 Jul 30.

Advances in the field of developing biomarkers for re-irradiation: a how-to guide to small, powerful data sets and artificial intelligence.

Expert Rev Precis Med Drug Dev. 2024;9(1):3-16. doi: 10.1080/23808993.2024.2325936. Epub 2024 Mar 11.

Parkinson's disease detection based on features refinement through L1 regularized SVM and deep neural network.

Sci Rep. 2024 Jan 16;14(1):1333. doi: 10.1038/s41598-024-51600-y.

Parkinson's Disease Diagnosis Using Laplacian Score, Gaussian Process Regression and Self-Organizing Maps.

Brain Sci. 2023 Mar 24;13(4):543. doi: 10.3390/brainsci13040543.

Bayesian Optimization with Support Vector Machine Model for Parkinson Disease Classification.

Sensors (Basel). 2023 Feb 13;23(4):2085. doi: 10.3390/s23042085.

Resting-state EEG-based convolutional neural network for the diagnosis of depression and its severity.

Front Physiol. 2022 Oct 10;13:956254. doi: 10.3389/fphys.2022.956254. eCollection 2022.

An Efficient Rotation Forest-Based Ensemble Approach for Predicting Severity of Parkinson's Disease.

J Healthc Eng. 2022 Jun 23;2022:5524852. doi: 10.1155/2022/5524852. eCollection 2022.

The role of uropathogenic Escherichia coli adhesive molecules in inflammatory response- comparative study on immunocompetent hosts and kidney recipients.

PLoS One. 2022 May 23;17(5):e0268243. doi: 10.1371/journal.pone.0268243. eCollection 2022.

Rapid, label-free classification of tumor-reactive T cell killing with quantitative phase microscopy and machine learning.

Sci Rep. 2021 Sep 30;11(1):19448. doi: 10.1038/s41598-021-98567-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于模糊的数据转换进行特征提取，以提高小医学数据集的分类性能。

A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献