He Ping, Xu Xiaohua, Chen Suquan
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6829-6842. doi: 10.1109/TNNLS.2024.3409394. Epub 2025 Apr 4.
High-dimensional data present significant challenges such as inadequate sample size, abundance of noise, and the curse of dimensionality, which make many traditional classification algorithms inapplicable. To provide valid inference for such data, it requires finding a noise-free low-dimensional representation that preserves both the underlying manifold structure and discriminative information. However, the existing methods often fail to take full consideration of these requirements. In this article, we introduce a robust supervised spline embedding (RS2E) algorithm for high-dimensional classification. The proposed approach is highlighted in four aspects: 1) it preserves the class-aware submanifold structure in the thin plate spline embedding space; 2) it eliminates noise and outliers to recover the clean manifold by exploiting its intrinsic low complexity; 3) it separates the class-aware submanifolds by maximizing the distance between each data point and the marginal data points of other class-aware submanifolds; and 4) it applies the alternating direction method of multipliers with generalized power iteration to solve the objective function. Promising experimental results on the real-world, generative adversarial network (GAN)-generated and artificially corrupted datasets demonstrate that RS2E outperforms other supervised dimensionality reduction algorithms in terms of classification accuracy.
高维数据带来了诸多重大挑战,如样本量不足、噪声丰富以及维度诅咒等,这使得许多传统分类算法不再适用。为了对这类数据进行有效的推断,需要找到一种无噪声的低维表示,既能保留潜在的流形结构,又能保留判别信息。然而,现有方法往往未能充分考虑这些要求。在本文中,我们介绍了一种用于高维分类的鲁棒监督样条嵌入(RS2E)算法。所提出的方法在四个方面具有突出特点:1)它在薄板样条嵌入空间中保留了类感知子流形结构;2)它通过利用其内在的低复杂性来消除噪声和离群值,以恢复干净的流形;3)它通过最大化每个数据点与其他类感知子流形的边缘数据点之间的距离来分离类感知子流形;4)它应用带有广义幂迭代的乘子交替方向法来求解目标函数。在真实世界、生成对抗网络(GAN)生成以及人工损坏的数据集上取得的有前景的实验结果表明,RS2E在分类准确率方面优于其他监督降维算法。