School of Information and Technology, Northwest University , Xi'an , China.
School of Information Engineering, Yulin University , Yulin , China.
Comput Assist Surg (Abingdon). 2019 Oct;24(sup2):62-72. doi: 10.1080/24699322.2019.1649074. Epub 2019 Aug 12.
To overcome the two-class imbalanced classification problem existing in the diagnosis of breast cancer, a hybrid of Random Over Sampling Example, K-means and Support vector machine (RK-SVM) model is proposed which is based on sample selection. Random Over Sampling Example (ROSE) is utilized to balance the dataset and further improve the diagnosis accuracy by Support Vector Machine (SVM). As there is one different sample selection factor via clustering that encourages selecting the samples near the class boundary. The purpose of clustering here is to reduce the risk of removing useful samples and improve the efficiency of sample selection. To test the performance of the new hybrid classifier, it is implemented on breast cancer datasets and the other three datasets from the University of California Irvine (UCI) machine learning repository, which are commonly used datasets in class imbalanced learning. The extensive experimental results show that our proposed hybrid method outperforms most of the competitive algorithms in term of G-mean and accuracy indices. Additionally, experimental results show that this method also performs superiorly for binary problems.
为了克服乳腺癌诊断中存在的两类不平衡分类问题,提出了一种基于样本选择的随机过采样示例、K-均值和支持向量机(RK-SVM)模型的混合模型。随机过采样示例(ROSE)用于平衡数据集,并通过支持向量机(SVM)进一步提高诊断准确性。由于通过聚类有一个不同的样本选择因素,鼓励选择类边界附近的样本。这里聚类的目的是降低去除有用样本的风险并提高样本选择的效率。为了测试新混合分类器的性能,将其应用于乳腺癌数据集以及加利福尼亚大学欧文分校(UCI)机器学习存储库中的另外三个数据集,这些数据集是不平衡学习中常用的数据集。广泛的实验结果表明,我们提出的混合方法在 G-均值和准确性指标方面优于大多数竞争算法。此外,实验结果表明,该方法在二进制问题上也表现出色。