Suppr超能文献

用于高维和小样本量问题的最大分散投影边际分类器

Maximum Decentral Projection Margin Classifier for High Dimension and Low Sample Size problems.

作者信息

Zhang Zhiwang, He Jing, Cao Jie, Li Shuqing

机构信息

College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China.

Department of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.

出版信息

Neural Netw. 2023 Jan;157:147-159. doi: 10.1016/j.neunet.2022.10.017. Epub 2022 Oct 22.

Abstract

Compared with relatively easy feature creation or generation in data analysis, manual data labeling needs a lot of time and effort in most cases. Even if automated data labeling​ seems to make it better in some cases, the labeling results still need to be checked and verified by manual. The High Dimension and Low Sample Size (HDLSS) data are therefore very common in data mining and machine learning. For classification problems with the HDLSS data, due to data piling and approximate equidistance between any two input points in high-dimension space, some traditional classifiers often give poor predictive performance. In this paper, we propose a Maximum Decentral Projection Margin Classifier (MDPMC) in the framework of a Support Vector Classifier (SVC). In the MDPMC model, the constraints of maximizing the projection distance between decentralized input points and their supporting hyperplane are integrated into the SVC model in addition to maximizing the margin of two supporting hyperplanes. On ten real HDLSS datasets, the experiment results show that the proposed MDPMC approach can deal well with data piling and approximate equidistance problems. Compared with SVC with Linear Kernel (SVC-LK) and Radial Basis Function Kernel (SVC-RBFK), Distance Weighted Discrimination (DWD), weighted DWD (wDWD), Distance-Weighted Support Vector Machine (DWSVM), Population-Guided Large Margin Classifier (PGLMC), and Data Maximum Dispersion Classifier (DMDC), MDPMC obtains better predictive accuracy and lower classification errors than the other seven classifiers on the HDLSS data.

摘要

与数据分析中相对容易的特征创建或生成相比,手动数据标注在大多数情况下需要大量时间和精力。即使自动数据标注在某些情况下似乎有所改善,标注结果仍需人工检查和验证。因此,高维小样本(HDLSS)数据在数据挖掘和机器学习中非常常见。对于HDLSS数据的分类问题,由于数据堆积以及高维空间中任意两个输入点之间的近似等距性,一些传统分类器的预测性能往往较差。在本文中,我们在支持向量分类器(SVC)框架下提出了一种最大分散投影边际分类器(MDPMC)。在MDPMC模型中,除了最大化两个支持超平面的边际外,还将最大化分散输入点与其支持超平面之间投影距离的约束集成到SVC模型中。在十个真实的HDLSS数据集上的实验结果表明,所提出的MDPMC方法能够很好地处理数据堆积和近似等距问题。与线性核支持向量分类器(SVC-LK)、径向基函数核支持向量分类器(SVC-RBFK)、距离加权判别(DWD)、加权DWD(wDWD)、距离加权支持向量机(DWSVM)、群体引导大边际分类器(PGLMC)和数据最大分散分类器(DMDC)相比,MDPMC在HDLSS数据上比其他七个分类器获得了更好的预测准确率和更低的分类错误率。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验