Suppr超能文献

一种用于不平衡数据学习的新型集成方法:外推-SMOTE支持向量机的装袋法

A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM.

作者信息

Wang Qi, Luo ZhiHao, Huang JinCai, Feng YangHe, Liu Zhong

机构信息

Science and Technology on Information Systems Engineering Laboratory, College of Information System and Management, National University of Defense Technology, Changsha, Hunan, China.

出版信息

Comput Intell Neurosci. 2017;2017:1827016. doi: 10.1155/2017/1827016. Epub 2017 Jan 30.

Abstract

Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the samples near the decision boundary which contain more discriminative information should be valued and the skew of the boundary would be corrected by constructing synthetic samples. Inspired by the truth and sense of geometry, we designed a new synthetic minority oversampling technique to incorporate the borderline information. What is more, ensemble model always tends to capture more complicated and robust decision boundary in practice. Taking these factors into considerations, a novel ensemble method, called Bagging of Extrapolation Borderline-SMOTE SVM (BEBS), has been proposed in dealing with imbalanced data learning (IDL) problems. Experiments on open access datasets showed significant superior performance using our model and a persuasive and intuitive explanation behind the method was illustrated. As far as we know, this is the first model combining ensemble of SVMs with borderline information for solving such condition.

摘要

类别不平衡在现实生活中普遍存在,这引起了各个领域的广泛关注。直接从不平衡数据集中学习可能会导致不尽人意的结果,因为过于关注识别准确率而得出次优模型。为解决这个问题,人们开发了各种方法,包括采样、代价敏感以及其他混合方法。然而,靠近决策边界且包含更多判别信息的样本应得到重视,并且可以通过构造合成样本纠正边界的偏差。受几何原理和直观感受的启发,我们设计了一种新的合成少数类过采样技术来整合边界信息。此外,在实践中,集成模型总是倾向于捕捉更复杂、更稳健的决策边界。考虑到这些因素,我们提出了一种名为Bagging of Extrapolation Borderline-SMOTE SVM(BEBS)的新型集成方法来处理不平衡数据学习(IDL)问题。在开放获取数据集上进行的实验表明,我们的模型具有显著的优越性能,并对该方法背后给出了有说服力且直观的解释。据我们所知,这是第一个将支持向量机集成与边界信息相结合来解决此类情况的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d0d/5304315/f04b744bd5a5/CIN2017-1827016.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验