Suppr超能文献

自适应最大熵图引导快速局部判别分析。

Adaptive Maximum Entropy Graph-Guided Fast Locality Discriminant Analysis.

出版信息

IEEE Trans Cybern. 2023 Jun;53(6):3574-3587. doi: 10.1109/TCYB.2021.3125956. Epub 2023 May 17.

Abstract

Linear discriminant analysis (LDA) aims to find a low-dimensional space in which data points in the same class are to be close to each other while keeping data points from different classes apart. To improve the robustness of LDA to non-Gaussian distribution data, most existing discriminant analysis methods extend LDA by approximating the underlying manifold of data. However, these methods suffer from the following problems: 1) local affinity or reconstruction coefficients are learned on the basis of the relationships of all data pairs, which would lead to a sharp increase in the amount of computation and 2) they learn the manifold information in the original space, ignoring the interference of the noise and redundant features. Motivated by these challenges, this article represents a novel discriminant analysis model, called fast and adaptive locality discriminant analysis (FALDA), to improve the efficiency and robustness. First, with the anchor-based strategy, a bipartite graph of each class is constructed to characterize the local structure of data. Since the number of anchor points is far less than that of data points, learning of fuzzy membership relationships between data points and anchor points within each class can save training time. Second, a maximum entropy regularization is introduced to control the uniformity of the weights of graphs and avoid the trivial solution. Third, the above relationships are updated adaptively in the process of dimensionality reduction, which can suppress the interference of the noise and redundant features. Fourth, the whitening constraint is imposed on the projection matrix to remove the relevance between features and restrict the total scatter of data in the subspace. Last but not the least, data with complex distribution can be explicitly divided into sub-blocks according to the learned anchor points (or subclass center points). We test our proposed method on synthetic data, benchmark datasets, and imbalanced datasets. Promising experimental results demonstrate the success of this novel model.

摘要

线性判别分析 (LDA) 旨在找到一个低维空间,使得同一类别的数据点尽可能靠近,而不同类别的数据点尽可能远离。为了提高 LDA 对非高斯分布数据的鲁棒性,大多数现有的判别分析方法通过对数据的潜在流形进行逼近来扩展 LDA。然而,这些方法存在以下问题:1)局部相似性或重构系数是基于所有数据对之间的关系来学习的,这会导致计算量急剧增加;2)它们在原始空间中学习流形信息,忽略了噪声和冗余特征的干扰。针对这些挑战,本文提出了一种新的判别分析模型,称为快速自适应局部判别分析(FALDA),以提高效率和鲁棒性。首先,采用基于锚点的策略构建每个类的二分图来刻画数据的局部结构。由于锚点的数量远远少于数据点的数量,因此学习每个类内数据点和锚点之间的模糊隶属关系可以节省训练时间。其次,引入最大熵正则化来控制图的权重均匀性,避免平凡解。第三,在降维过程中自适应地更新上述关系,可以抑制噪声和冗余特征的干扰。第四,对投影矩阵施加白化约束,以去除特征之间的相关性,并限制子空间中数据的总散布。最后但并非最不重要的是,复杂分布的数据可以根据学习到的锚点(或子类中心点)明确地划分为子块。我们在合成数据、基准数据集和不平衡数据集上对所提出的方法进行了测试。有前途的实验结果表明了这个新模型的成功。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验