Demidova Liliya A, Gorchakov Artyom V
Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education "MIREA-Russian Technological University", 78, Vernadsky Avenue, 119454 Moscow, Russia.
J Imaging. 2022 Apr 15;8(4):113. doi: 10.3390/jimaging8040113.
Dimensionality reduction techniques are often used by researchers in order to make high dimensional data easier to interpret visually, as data visualization is only possible in low dimensional spaces. Recent research in nonlinear dimensionality reduction introduced many effective algorithms, including t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), dimensionality reduction technique based on triplet constraints (TriMAP), and pairwise controlled manifold approximation (PaCMAP), aimed to preserve both the local and global structure of high dimensional data while reducing the dimensionality. The UMAP algorithm has found its application in bioinformatics, genetics, genomics, and has been widely used to improve the accuracy of other machine learning algorithms. In this research, we compare the performance of different fuzzy information discrimination measures used as loss functions in the UMAP algorithm while constructing low dimensional embeddings. In order to achieve this, we derive the gradients of the considered losses analytically and employ the Adam algorithm during the loss function optimization process. From the conducted experimental studies we conclude that the use of either the logarithmic fuzzy cross entropy loss without reduced repulsion or the symmetric logarithmic fuzzy cross entropy loss with sufficiently large neighbor count leads to better global structure preservation of the original multidimensional data when compared to the loss function used in the original UMAP algorithm implementation.
降维技术经常被研究人员使用,以便使高维数据在视觉上更易于解释,因为数据可视化仅在低维空间中才有可能实现。近期关于非线性降维的研究引入了许多有效算法,包括t分布随机邻域嵌入(t-SNE)、均匀流形近似与投影(UMAP)、基于三元组约束的降维技术(TriMAP)以及成对控制流形近似(PaCMAP),旨在在降低维度的同时保留高维数据的局部和全局结构。UMAP算法已在生物信息学、遗传学、基因组学中得到应用,并被广泛用于提高其他机器学习算法的准确性。在本研究中,我们比较了在构建低维嵌入时,UMAP算法中用作损失函数的不同模糊信息判别度量的性能。为了实现这一点,我们解析地推导了所考虑损失的梯度,并在损失函数优化过程中采用了Adam算法。从所进行的实验研究中我们得出结论,与原始UMAP算法实现中使用的损失函数相比,使用无减少排斥的对数模糊交叉熵损失或具有足够大邻域数量的对称对数模糊交叉熵损失,能更好地保留原始多维数据的全局结构。