模糊信息判别度量及其在UMAP算法中低维嵌入构建中的应用。

Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm.

作者信息

Demidova Liliya A, Gorchakov Artyom V

机构信息

Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education "MIREA-Russian Technological University", 78, Vernadsky Avenue, 119454 Moscow, Russia.

出版信息

J Imaging. 2022 Apr 15;8(4):113. doi: 10.3390/jimaging8040113.

DOI:10.3390/jimaging8040113

PMID:35448241

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9028155/

Abstract

Dimensionality reduction techniques are often used by researchers in order to make high dimensional data easier to interpret visually, as data visualization is only possible in low dimensional spaces. Recent research in nonlinear dimensionality reduction introduced many effective algorithms, including t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), dimensionality reduction technique based on triplet constraints (TriMAP), and pairwise controlled manifold approximation (PaCMAP), aimed to preserve both the local and global structure of high dimensional data while reducing the dimensionality. The UMAP algorithm has found its application in bioinformatics, genetics, genomics, and has been widely used to improve the accuracy of other machine learning algorithms. In this research, we compare the performance of different fuzzy information discrimination measures used as loss functions in the UMAP algorithm while constructing low dimensional embeddings. In order to achieve this, we derive the gradients of the considered losses analytically and employ the Adam algorithm during the loss function optimization process. From the conducted experimental studies we conclude that the use of either the logarithmic fuzzy cross entropy loss without reduced repulsion or the symmetric logarithmic fuzzy cross entropy loss with sufficiently large neighbor count leads to better global structure preservation of the original multidimensional data when compared to the loss function used in the original UMAP algorithm implementation.

摘要

降维技术经常被研究人员使用，以便使高维数据在视觉上更易于解释，因为数据可视化仅在低维空间中才有可能实现。近期关于非线性降维的研究引入了许多有效算法，包括t分布随机邻域嵌入（t-SNE）、均匀流形近似与投影（UMAP）、基于三元组约束的降维技术（TriMAP）以及成对控制流形近似（PaCMAP），旨在在降低维度的同时保留高维数据的局部和全局结构。UMAP算法已在生物信息学、遗传学、基因组学中得到应用，并被广泛用于提高其他机器学习算法的准确性。在本研究中，我们比较了在构建低维嵌入时，UMAP算法中用作损失函数的不同模糊信息判别度量的性能。为了实现这一点，我们解析地推导了所考虑损失的梯度，并在损失函数优化过程中采用了Adam算法。从所进行的实验研究中我们得出结论，与原始UMAP算法实现中使用的损失函数相比，使用无减少排斥的对数模糊交叉熵损失或具有足够大邻域数量的对称对数模糊交叉熵损失，能更好地保留原始多维数据的全局结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3b1/9028155/261187d665b5/jimaging-08-00113-g001.jpg

相似文献

Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm.模糊信息判别度量及其在UMAP算法中低维嵌入构建中的应用。

J Imaging. 2022 Apr 15;8(4):113. doi: 10.3390/jimaging8040113.

Embedding Functional Brain Networks in Low Dimensional Spaces Using Manifold Learning Techniques.使用流形学习技术将功能性脑网络嵌入低维空间

Front Neuroinform. 2021 Dec 24;15:740143. doi: 10.3389/fninf.2021.740143. eCollection 2021.

A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations.交叉熵测试允许对 t-SNE 和 UMAP 表示进行定量统计比较。

Cell Rep Methods. 2023 Jan 13;3(1):100390. doi: 10.1016/j.crmeth.2022.100390. eCollection 2023 Jan 23.

Evaluation of Distance Metrics and Spatial Autocorrelation in Uniform Manifold Approximation and Projection Applied to Mass Spectrometry Imaging Data.基于均摊近似和投影的距离度量和空间自相关评估及其在质谱成像数据中的应用。

Anal Chem. 2019 May 7;91(9):5706-5714. doi: 10.1021/acs.analchem.8b05827. Epub 2019 Apr 25.

Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data.UMAP 通过降维增强了批量转录组数据中样本异质性分析。

Cell Rep. 2021 Jul 27;36(4):109442. doi: 10.1016/j.celrep.2021.109442.

Polygenic risk modeling of tumor stage and survival in bladder cancer.膀胱癌肿瘤分期和生存的多基因风险建模

BioData Min. 2022 Sep 30;15(1):23. doi: 10.1186/s13040-022-00306-w.

UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study.UMAP 作为生物大分子分子动力学模拟的降维工具：一项对比研究。

J Phys Chem B. 2021 May 20;125(19):5022-5034. doi: 10.1021/acs.jpcb.1c02081. Epub 2021 May 11.

Application of Uniform Manifold Approximation and Projection (UMAP) in spectral imaging of artworks.统一流形逼近与投影（UMAP）在艺术品光谱成像中的应用。

Spectrochim Acta A Mol Biomol Spectrosc. 2021 May 5;252:119547. doi: 10.1016/j.saa.2021.119547. Epub 2021 Feb 4.

Parametric UMAP Embeddings for Representation and Semisupervised Learning.用于表示和半监督学习的参数化均匀流形近似投影嵌入

Neural Comput. 2021 Oct 12;33(11):2881-2907. doi: 10.1162/neco_a_01434.

The application of Uniform Manifold Approximation and Projection (UMAP) for unconstrained ordination and classification of biological indicators in aquatic ecology.统一流形逼近和投影（UMAP）在水生生态学中生物指标的无约束排序和分类中的应用。

Sci Total Environ. 2022 Apr 1;815:152365. doi: 10.1016/j.scitotenv.2021.152365. Epub 2021 Dec 25.

引用本文的文献

AI-Enhanced evaluation of YouTube content on post-surgical incontinence following pelvic cancer treatment.人工智能辅助评估YouTube上关于盆腔癌治疗后手术失禁的内容。

SSM Popul Health. 2024 May 4;26:101677. doi: 10.1016/j.ssmph.2024.101677. eCollection 2024 Jun.

本文引用的文献

UMAP Based Anomaly Detection for Minimal Residual Disease Quantification within Acute Myeloid Leukemia.基于UMAP的急性髓系白血病微小残留病定量异常检测

Cancers (Basel). 2022 Feb 11;14(4):898. doi: 10.3390/cancers14040898.

UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets.基于 UMAP 的 SARS-CoV-2 大规模突变数据集的 K-means 聚类分析。

Comput Biol Med. 2021 Apr;131:104264. doi: 10.1016/j.compbiomed.2021.104264. Epub 2021 Feb 22.

Spatial Segmentation of Mass Spectrometry Imaging Data by Combining Multivariate Clustering and Univariate Thresholding.通过多元聚类和单变量阈值相结合对质谱成像数据进行空间分割。

Anal Chem. 2021 Feb 23;93(7):3477-3485. doi: 10.1021/acs.analchem.0c04798. Epub 2021 Feb 11.

SUSCC: Secondary Construction of Feature Space based on UMAP for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data.SUSCC：基于 UMAP 的特征空间二次构建，用于快速准确地聚类大规模单细胞 RNA-seq 数据。

Interdiscip Sci. 2021 Mar;13(1):83-90. doi: 10.1007/s12539-020-00411-6. Epub 2021 Jan 21.

Array programming with NumPy.使用 NumPy 进行数组编程。

Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.

Dimensionality reduction by UMAP to visualize physical and genetic interactions.UMAP 通过降维可视化物理和遗传相互作用。

Nat Commun. 2020 Mar 24;11(1):1537. doi: 10.1038/s41467-020-15351-4.

Dimensionality reduction for visualizing single-cell data using UMAP.使用UMAP进行单细胞数据可视化的降维方法。

Nat Biotechnol. 2018 Dec 3. doi: 10.1038/nbt.4314.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

模糊信息判别度量及其在UMAP算法中低维嵌入构建中的应用。

Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献