Gu Xiaoqing, Ni Tongguang, Wang Hongyuan
School of Information Science and Engineering, Changzhou University, Changzhou 213164, China.
ScientificWorldJournal. 2014 Mar 23;2014:536434. doi: 10.1155/2014/536434. eCollection 2014.
In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.
在医学数据集分类中,支持向量机(SVM)被认为是最成功的方法之一。然而,大多数实际的医学数据集通常包含一些离群值/噪声,并且数据常常存在类别不平衡问题。本文提出了一种针对类别不平衡问题的模糊支持向量机(FSVM)(称为FSVM-CIP),它可以被视为通过扩展流形正则化并为两类分配两种误分类代价而对FSVM进行修改的一类方法。所提出的FSVM-CIP可用于处理存在离群值/噪声情况下的类别不平衡问题,并增强局部最大间隔。使用来自UCI医学数据库的五个实际医学数据集,即乳腺癌、心脏病、肝炎、BUPA肝脏和皮马糖尿病数据集,来说明本文提出的方法。在这些数据集上的实验结果表明了FSVM-CIP的优异或相当的有效性。