Yilmaz Nihat, Inan Onur, Uzer Mustafa Serter
Electrical-Electronics Engineering Department, Engineering Faculty, Selcuk University, Konya, Turkey,
J Med Syst. 2014 May;38(5):48. doi: 10.1007/s10916-014-0048-7. Epub 2014 Apr 16.
The most important factors that prevent pattern recognition from functioning rapidly and effectively are the noisy and inconsistent data in databases. This article presents a new data preparation method based on clustering algorithms for diagnosis of heart and diabetes diseases. In this method, a new modified K-means Algorithm is used for clustering based data preparation system for the elimination of noisy and inconsistent data and Support Vector Machines is used for classification. This newly developed approach was tested in the diagnosis of heart diseases and diabetes, which are prevalent within society and figure among the leading causes of death. The data sets used in the diagnosis of these diseases are the Statlog (Heart), the SPECT images and the Pima Indians Diabetes data sets obtained from the UCI database. The proposed system achieved 97.87 %, 98.18 %, 96.71 % classification success rates from these data sets. Classification accuracies for these data sets were obtained through using 10-fold cross-validation method. According to the results, the proposed method of performance is highly successful compared to other results attained, and seems very promising for pattern recognition applications.
阻碍模式识别快速有效运行的最重要因素是数据库中存在噪声和不一致的数据。本文提出了一种基于聚类算法的新数据准备方法,用于心脏病和糖尿病的诊断。在该方法中,一种新的改进K均值算法用于基于聚类的数据准备系统,以消除噪声和不一致的数据,支持向量机用于分类。这种新开发的方法在心脏病和糖尿病诊断中进行了测试,这两种疾病在社会中普遍存在且是主要死因之一。用于这些疾病诊断的数据集是从UCI数据库获得的Statlog(Heart)、SPECT图像和皮马印第安人糖尿病数据集。所提出的系统从这些数据集中分别取得了97.87%、98.18%、96.71%的分类成功率。这些数据集的分类准确率是通过使用10折交叉验证方法获得的。根据结果,所提出的方法与其他已获得的结果相比性能非常成功,并且对于模式识别应用似乎非常有前景。