Suppr超能文献

一种基于t分布随机邻域嵌入的预处理流形学习策略

A Preprocessing Manifold Learning Strategy Based on t-Distributed Stochastic Neighbor Embedding.

作者信息

Shi Sha, Xu Yefei, Xu Xiaoyang, Mo Xiaofan, Ding Jun

机构信息

State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi'an 710071, China.

National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing 100101, China.

出版信息

Entropy (Basel). 2023 Jul 14;25(7):1065. doi: 10.3390/e25071065.

Abstract

In machine learning and data analysis, dimensionality reduction and high-dimensional data visualization can be accomplished by manifold learning using a t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm. We significantly improve this manifold learning scheme by introducing a preprocessing strategy for the t-SNE algorithm. In our preprocessing, we exploit Laplacian eigenmaps to reduce the high-dimensional data first, which can aggregate each data cluster and reduce the Kullback-Leibler divergence (KLD) remarkably. Moreover, the k-nearest-neighbor (KNN) algorithm is also involved in our preprocessing to enhance the visualization performance and reduce the computation and space complexity. We compare the performance of our strategy with that of the standard t-SNE on the MNIST dataset. The experiment results show that our strategy exhibits a stronger ability to separate different clusters as well as keep data of the same kind much closer to each other. Moreover, the KLD can be reduced by about 30% at the cost of increasing the complexity in terms of runtime by only 1-2%.

摘要

在机器学习和数据分析中,降维和高维数据可视化可通过使用t分布随机邻域嵌入(t-SNE)算法的流形学习来实现。我们通过为t-SNE算法引入一种预处理策略,显著改进了这种流形学习方案。在我们的预处理中,我们首先利用拉普拉斯特征映射来降低高维数据,这可以聚合每个数据簇并显著降低库尔贝克-莱布勒散度(KLD)。此外,k近邻(KNN)算法也参与到我们的预处理中,以提高可视化性能并降低计算和空间复杂度。我们在MNIST数据集上比较了我们的策略与标准t-SNE的性能。实验结果表明,我们的策略表现出更强的分离不同簇的能力,同时能使同类数据彼此更接近。此外,KLD可以降低约30%,而运行时复杂度仅增加1-2%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/124e/10378244/d995140595b1/entropy-25-01065-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验