Suppr超能文献

基于模糊建模和选定特征的医学数据挖掘

Medical data mining by fuzzy modeling with selected features.

作者信息

Ghazavi Sean N, Liao Thunshun W

机构信息

Industrial Engineering Department, Louisiana State University, Baton Rouge, LA 70803, USA.

出版信息

Artif Intell Med. 2008 Jul;43(3):195-206. doi: 10.1016/j.artmed.2008.04.004. Epub 2008 Jun 5.

Abstract

OBJECTIVE

Medical data is often very high dimensional. Depending upon the use, some data dimensions might be more relevant than others. In processing medical data, choosing the optimal subset of features is such important, not only to reduce the processing cost but also to improve the usefulness of the model built from the selected data. This paper presents a data mining study of medical data with fuzzy modeling methods that use feature subsets selected by some indices/methods.

METHODS

Specifically, three fuzzy modeling methods including the fuzzy k-nearest neighbor algorithm, a fuzzy clustering-based modeling, and the adaptive network-based fuzzy inference system are employed. For feature selection, a total of 11 indices/methods are used. Medical data mined include the Wisconsin breast cancer dataset and the Pima Indians diabetes dataset. The classification accuracy and computational time are reported. To show how good the best performer is, the globally optimal was also found by carrying out an exhaustive testing of all possible combinations of feature subsets with three features.

RESULTS

For the Wisconsin breast cancer dataset, the best accuracy of 97.17% was obtained, which is only 0.25% lower than that was obtained by exhaustive testing. For the Pima Indians diabetes dataset, the best accuracy of 77.65% was obtained, which is only 0.13% lower than that obtained by exhaustive testing.

CONCLUSION

This paper has shown that feature selection is important to mining medical data for reducing processing time and for increasing classification accuracy. However, not all combinations of feature selection and modeling methods are equally effective and the best combination is often data-dependent, as supported by the breast cancer and diabetes data analyzed in this paper.

摘要

目的

医学数据通常具有很高的维度。根据用途不同,某些数据维度可能比其他维度更相关。在处理医学数据时,选择最优特征子集非常重要,这不仅可以降低处理成本,还能提高基于所选数据构建的模型的实用性。本文使用模糊建模方法对医学数据进行数据挖掘研究,这些方法使用通过某些指标/方法选择的特征子集。

方法

具体而言,采用了三种模糊建模方法,包括模糊k近邻算法、基于模糊聚类的建模方法以及基于自适应网络的模糊推理系统。在特征选择方面,总共使用了11种指标/方法。挖掘的医学数据包括威斯康星乳腺癌数据集和皮马印第安人糖尿病数据集。报告了分类准确率和计算时间。为了展示最佳性能者的表现有多好,还通过对具有三个特征的特征子集的所有可能组合进行详尽测试找到了全局最优解。

结果

对于威斯康星乳腺癌数据集,获得了97.17%的最佳准确率,仅比通过详尽测试获得的准确率低0.25%。对于皮马印第安人糖尿病数据集,获得了77.65%的最佳准确率,仅比通过详尽测试获得的准确率低0.13%。

结论

本文表明,特征选择对于挖掘医学数据以减少处理时间和提高分类准确率非常重要。然而,并非所有特征选择和建模方法的组合都同样有效,最佳组合通常取决于数据,本文分析的乳腺癌和糖尿病数据支持了这一点。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验