Suppr超能文献

基于马氏距离-田口法的高维小样本数据分类优化方法。

Optimized Mahalanobis-Taguchi System for High-Dimensional Small Sample Data Classification.

机构信息

School of Science, Wuhan University of Technology, Wuhan 430070, China.

出版信息

Comput Intell Neurosci. 2020 Apr 26;2020:4609423. doi: 10.1155/2020/4609423. eCollection 2020.

Abstract

The Mahalanobis-Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimensional small sample data is discussed from two aspects, namely, the inverse matrix instability of the covariance matrix and the instability of feature selection. Firstly, based on regularization and smoothing techniques, this paper proposes a modified Mahalanobis metric to calculate the Mahalanobis distance, which is aimed at reducing the influence of the inverse matrix instability under small sample conditions. Secondly, the minimum redundancy-maximum relevance (mRMR) algorithm is introduced into the MTS for the instability problem of feature selection. By using the mRMR algorithm and signal-to-noise ratio (SNR), a two-stage feature selection method is proposed: the mRMR algorithm is first used to remove noise and redundant variables; the orthogonal table and SNR are then used to screen the combination of variables that make great contribution to classification. Then, the feasibility and simplicity of the optimized MTS are shown in five datasets from the UCI database. The Mahalanobis distance based on regularization and smoothing techniques (RS-MD) is more robust than the traditional Mahalanobis distance. The two-stage feature selection method improves the effectiveness of feature selection for MTS. Finally, the optimized MTS is applied to email classification of the Spambase dataset. The results show that the optimized MTS outperforms the classical MTS and the other 3 machine learning algorithms.

摘要

马氏距离-田口系统(MTS)是一种多元数据分析和预测技术,广泛应用于优化大样本数据或不平衡数据,但很少用于高维小样本数据。本文从协方差矩阵逆矩阵不稳定性和特征选择不稳定性两个方面讨论了用于高维小样本数据分类的优化 MTS。首先,基于正则化和平滑技术,本文提出了一种改进的马氏度量来计算马氏距离,旨在减少小样本条件下逆矩阵不稳定性的影响。其次,将最小冗余最大相关性(mRMR)算法引入 MTS 中,以解决特征选择的不稳定性问题。通过使用 mRMR 算法和信噪比(SNR),提出了一种两阶段特征选择方法:首先使用 mRMR 算法去除噪声和冗余变量;然后使用正交表和 SNR 筛选对分类有较大贡献的变量组合。然后,从 UCI 数据库中的五个数据集展示了优化 MTS 的可行性和简单性。基于正则化和平滑技术的马氏距离(RS-MD)比传统的马氏距离更稳健。两阶段特征选择方法提高了 MTS 的特征选择有效性。最后,将优化 MTS 应用于 Spambase 数据集的电子邮件分类。结果表明,优化 MTS 优于经典 MTS 和其他 3 种机器学习算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6086/7199641/5fa2a49fca4e/CIN2020-4609423.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验