• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于模糊的数据转换进行特征提取,以提高小医学数据集的分类性能。

A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

机构信息

Department of Industrial and Information Management, National Cheng Kung University, 1, University Road, Tainan 70101, Taiwan.

出版信息

Artif Intell Med. 2011 May;52(1):45-52. doi: 10.1016/j.artmed.2011.02.001. Epub 2011 Apr 13.

DOI:10.1016/j.artmed.2011.02.001
PMID:21493051
Abstract

OBJECTIVE

Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small.

METHODS

This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper.

RESULTS

This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance.

CONCLUSION

This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches.

摘要

目的

医学数据集通常规模较小且具有非常高的维度。过多的属性会降低分析效率,且不一定会提高准确性,而过少的数据则会降低建模稳定性。因此,本研究的主要目的是提取最佳特征子集,以在数据集较小时提高分析性能。

方法

本文提出了一种基于模糊的非线性变换方法,用于从原始数据属性值扩展与分类相关的信息,以适用于小规模数据集。基于新的变换数据集,本研究应用主成分分析(PCA)提取最佳特征子集。最后,我们使用这些最佳特征的变换数据作为学习工具(支持向量机 SVM)的输入数据。我们使用了六个医学数据集:皮马印第安人糖尿病、威斯康星州乳腺癌诊断、帕金森病、超声心动图、BUPA 肝疾病数据集和台湾膀胱癌病例,以说明本文提出的方法。

结果

本研究使用 t 检验评估单个数据集的分类准确性;并使用 Friedman 检验表明,与其他方法相比,该方法在多个数据集上具有更好的性能。实验结果表明,当数据集较小时,与 PCA 或核主成分分析(KPCA)相比,所提出的方法具有更好的分类性能,并且建议创建新的与目的相关的信息来提高分析性能。

结论

本文表明,特征提取作为特征选择的一个函数,对于高效数据分析非常重要。当数据集较小时,使用本文提出的基于模糊的变换方法来增加可用信息,比 PCA 和 KPCA 方法产生更好的结果。

相似文献

1
A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.基于模糊的数据转换进行特征提取,以提高小医学数据集的分类性能。
Artif Intell Med. 2011 May;52(1):45-52. doi: 10.1016/j.artmed.2011.02.001. Epub 2011 Apr 13.
2
Medical data mining by fuzzy modeling with selected features.基于模糊建模和选定特征的医学数据挖掘
Artif Intell Med. 2008 Jul;43(3):195-206. doi: 10.1016/j.artmed.2008.04.004. Epub 2008 Jun 5.
3
Fuzzy wavelet packet based feature extraction method and its application to biomedical signal classification.基于模糊小波包的特征提取方法及其在生物医学信号分类中的应用。
IEEE Trans Biomed Eng. 2005 Jun;52(6):1132-9. doi: 10.1109/TBME.2005.848377.
4
An interpretable fuzzy rule-based classification methodology for medical diagnosis.一种用于医学诊断的基于模糊规则的可解释分类方法。
Artif Intell Med. 2009 Sep;47(1):25-41. doi: 10.1016/j.artmed.2009.05.003. Epub 2009 Jun 18.
5
Classification for high-throughput data with an optimal subset of principal components.利用主成分的最优子集对高通量数据进行分类。
Comput Biol Chem. 2009 Oct;33(5):408-13. doi: 10.1016/j.compbiolchem.2009.07.017. Epub 2009 Aug 18.
6
A threshold fuzzy entropy based feature selection for medical database classification.基于阈值模糊熵的医学数据库分类特征选择。
Comput Biol Med. 2013 Dec;43(12):2222-9. doi: 10.1016/j.compbiomed.2013.10.016. Epub 2013 Oct 25.
7
Classification of jet fuels by fuzzy rule-building expert systems applied to three-way data by fast gas chromatography--fast scanning quadrupole ion trap mass spectrometry.采用快速气相色谱-快速扫描四极杆离子阱质谱法对三路数据进行模糊规则生成专家系统对喷气燃料进行分类。
Talanta. 2011 Jan 30;83(4):1260-8. doi: 10.1016/j.talanta.2010.05.063. Epub 2010 Jun 8.
8
The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification.基于互信息的特征选择与基于模糊最小二乘支持向量机的分类器在运动分类中的应用。
Comput Methods Programs Biomed. 2008 Jun;90(3):275-84. doi: 10.1016/j.cmpb.2008.01.003. Epub 2008 Mar 4.
9
Similarity classifier using similarity measure derived from Yu's norms in classification of medical data sets.在医学数据集分类中使用源自于余氏范数的相似性度量的相似性分类器。
Comput Biol Med. 2007 Aug;37(8):1133-40. doi: 10.1016/j.compbiomed.2006.10.005. Epub 2006 Nov 28.
10
An expert system based on principal component analysis, artificial immune system and fuzzy k-NN for diagnosis of valvular heart diseases.一种基于主成分分析、人工免疫系统和模糊k近邻算法的用于诊断心脏瓣膜疾病的专家系统。
Comput Biol Med. 2008 Mar;38(3):329-38. doi: 10.1016/j.compbiomed.2007.11.004. Epub 2008 Jan 4.

引用本文的文献

1
Exploring unsupervised feature extraction algorithms: tackling high dimensionality in small datasets.探索无监督特征提取算法:解决小数据集中的高维问题。
Sci Rep. 2025 Jul 1;15(1):21973. doi: 10.1038/s41598-025-07725-9.
2
PD-DETECTOR: A sustainable and computationally intelligent mobile application model for Parkinson's disease severity assessment.PD检测仪:一种用于帕金森病严重程度评估的可持续且具有计算智能的移动应用模型。
Heliyon. 2024 Jul 15;10(14):e34593. doi: 10.1016/j.heliyon.2024.e34593. eCollection 2024 Jul 30.
3
Advances in the field of developing biomarkers for re-irradiation: a how-to guide to small, powerful data sets and artificial intelligence.
再照射生物标志物开发领域的进展:小型高效数据集与人工智能实用指南
Expert Rev Precis Med Drug Dev. 2024;9(1):3-16. doi: 10.1080/23808993.2024.2325936. Epub 2024 Mar 11.
4
Parkinson's disease detection based on features refinement through L1 regularized SVM and deep neural network.基于 L1 正则化 SVM 和深度神经网络的特征细化的帕金森病检测。
Sci Rep. 2024 Jan 16;14(1):1333. doi: 10.1038/s41598-024-51600-y.
5
Parkinson's Disease Diagnosis Using Laplacian Score, Gaussian Process Regression and Self-Organizing Maps.基于拉普拉斯分数、高斯过程回归和自组织映射的帕金森病诊断
Brain Sci. 2023 Mar 24;13(4):543. doi: 10.3390/brainsci13040543.
6
Bayesian Optimization with Support Vector Machine Model for Parkinson Disease Classification.基于支持向量机模型的贝叶斯优化在帕金森病分类中的应用。
Sensors (Basel). 2023 Feb 13;23(4):2085. doi: 10.3390/s23042085.
7
Resting-state EEG-based convolutional neural network for the diagnosis of depression and its severity.基于静息态脑电图的卷积神经网络用于抑郁症及其严重程度的诊断。
Front Physiol. 2022 Oct 10;13:956254. doi: 10.3389/fphys.2022.956254. eCollection 2022.
8
An Efficient Rotation Forest-Based Ensemble Approach for Predicting Severity of Parkinson's Disease.基于旋转森林的高效集成方法预测帕金森病严重程度。
J Healthc Eng. 2022 Jun 23;2022:5524852. doi: 10.1155/2022/5524852. eCollection 2022.
9
The role of uropathogenic Escherichia coli adhesive molecules in inflammatory response- comparative study on immunocompetent hosts and kidney recipients.尿路致病性大肠杆菌黏附分子在炎症反应中的作用——免疫功能正常宿主和肾移植受者的比较研究。
PLoS One. 2022 May 23;17(5):e0268243. doi: 10.1371/journal.pone.0268243. eCollection 2022.
10
Rapid, label-free classification of tumor-reactive T cell killing with quantitative phase microscopy and machine learning.利用定量相位显微镜和机器学习快速、无标记地分类肿瘤反应性 T 细胞杀伤。
Sci Rep. 2021 Sep 30;11(1):19448. doi: 10.1038/s41598-021-98567-8.