Suppr超能文献

基于模糊的数据转换进行特征提取,以提高小医学数据集的分类性能。

A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

机构信息

Department of Industrial and Information Management, National Cheng Kung University, 1, University Road, Tainan 70101, Taiwan.

出版信息

Artif Intell Med. 2011 May;52(1):45-52. doi: 10.1016/j.artmed.2011.02.001. Epub 2011 Apr 13.

Abstract

OBJECTIVE

Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small.

METHODS

This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper.

RESULTS

This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance.

CONCLUSION

This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches.

摘要

目的

医学数据集通常规模较小且具有非常高的维度。过多的属性会降低分析效率,且不一定会提高准确性,而过少的数据则会降低建模稳定性。因此,本研究的主要目的是提取最佳特征子集,以在数据集较小时提高分析性能。

方法

本文提出了一种基于模糊的非线性变换方法,用于从原始数据属性值扩展与分类相关的信息,以适用于小规模数据集。基于新的变换数据集,本研究应用主成分分析(PCA)提取最佳特征子集。最后,我们使用这些最佳特征的变换数据作为学习工具(支持向量机 SVM)的输入数据。我们使用了六个医学数据集:皮马印第安人糖尿病、威斯康星州乳腺癌诊断、帕金森病、超声心动图、BUPA 肝疾病数据集和台湾膀胱癌病例,以说明本文提出的方法。

结果

本研究使用 t 检验评估单个数据集的分类准确性;并使用 Friedman 检验表明,与其他方法相比,该方法在多个数据集上具有更好的性能。实验结果表明,当数据集较小时,与 PCA 或核主成分分析(KPCA)相比,所提出的方法具有更好的分类性能,并且建议创建新的与目的相关的信息来提高分析性能。

结论

本文表明,特征提取作为特征选择的一个函数,对于高效数据分析非常重要。当数据集较小时,使用本文提出的基于模糊的变换方法来增加可用信息,比 PCA 和 KPCA 方法产生更好的结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验