Sui Yuan, Wei Ying, Zhao Dazhe
Software College, Northeastern University, Shenyang 110004, China.
School of Information Science and Engineering, Northeastern University, Shenyang 110004, China ; Key Laboratory of Medical Imaging Calculation of the Ministry of Education, Shenyang 110004, China.
Comput Math Methods Med. 2015;2015:368674. doi: 10.1155/2015/368674. Epub 2015 Apr 6.
In lung cancer computer-aided detection/diagnosis (CAD) systems, classification of regions of interest (ROI) is often used to detect/diagnose lung nodule accurately. However, problems of unbalanced datasets often have detrimental effects on the performance of classification. In this paper, both minority and majority classes are resampled to increase the generalization ability. We propose a novel SVM classifier combined with random undersampling (RU) and SMOTE for lung nodule recognition. The combinations of the two resampling methods not only achieve a balanced training samples but also remove noise and duplicate information in the training sample and retain useful information to improve the effective data utilization, hence improving performance of SVM algorithm for pulmonary nodules classification under the unbalanced data. Eight features including 2D and 3D features are extracted for training and classification. Experimental results show that for different sizes of training datasets our RU-SMOTE-SVM classifier gets the highest classification accuracy among the four kinds of classifiers, and the average classification accuracy is more than 92.94%.
在肺癌计算机辅助检测/诊断(CAD)系统中,通常利用感兴趣区域(ROI)分类来准确检测/诊断肺结节。然而,数据集不平衡问题往往会对分类性能产生不利影响。本文对少数类和多数类进行重采样,以提高泛化能力。我们提出了一种结合随机欠采样(RU)和SMOTE的新型支持向量机(SVM)分类器用于肺结节识别。这两种重采样方法的结合不仅实现了训练样本的平衡,还去除了训练样本中的噪声和重复信息,并保留了有用信息,从而提高了有效数据利用率,进而提升了不平衡数据下SVM算法对肺结节分类的性能。提取包括二维和三维特征在内的八个特征用于训练和分类。实验结果表明,对于不同规模的训练数据集,我们的RU-SMOTE-SVM分类器在四种分类器中获得了最高的分类准确率,平均分类准确率超过92.94%。