迈向降维技术的定量调查。

Toward a Quantitative Survey of Dimension Reduction Techniques.

作者信息

Espadoto Mateus, Martins Rafael M, Kerren Andreas, Hirata Nina S T, Telea Alexandru C

出版信息

IEEE Trans Vis Comput Graph. 2021 Mar;27(3):2153-2173. doi: 10.1109/TVCG.2019.2944182. Epub 2021 Jan 28.

DOI:10.1109/TVCG.2019.2944182

Abstract

Dimensionality reduction methods, also known as projections, are frequently used in multidimensional data exploration in machine learning, data science, and information visualization. Tens of such techniques have been proposed, aiming to address a wide set of requirements, such as ability to show the high-dimensional data structure, distance or neighborhood preservation, computational scalability, stability to data noise and/or outliers, and practical ease of use. However, it is far from clear for practitioners how to choose the best technique for a given use context. We present a survey of a wide body of projection techniques that helps answering this question. For this, we characterize the input data space, projection techniques, and the quality of projections, by several quantitative metrics. We sample these three spaces according to these metrics, aiming at good coverage with bounded effort. We describe our measurements and outline observed dependencies of the measured variables. Based on these results, we draw several conclusions that help comparing projection techniques, explain their results for different types of data, and ultimately help practitioners when choosing a projection for a given context. Our methodology, datasets, projection implementations, metrics, visualizations, and results are publicly open, so interested stakeholders can examine and/or extend this benchmark.

摘要

降维方法，也称为投影法，常用于机器学习、数据科学和信息可视化中的多维数据探索。人们已经提出了数十种此类技术，旨在满足一系列广泛的需求，例如展示高维数据结构的能力、距离或邻域保持、计算可扩展性、对数据噪声和/或异常值的稳定性以及实际易用性。然而，对于从业者来说，如何为给定的使用场景选择最佳技术还远不清楚。我们对大量投影技术进行了调查，以帮助回答这个问题。为此，我们通过几个定量指标来描述输入数据空间、投影技术和投影质量。我们根据这些指标对这三个空间进行采样，旨在以有限的工作量实现良好的覆盖。我们描述我们的测量方法，并概述所测变量之间观察到的相关性。基于这些结果，我们得出了几个有助于比较投影技术的结论，解释它们对不同类型数据的结果，并最终在从业者为给定场景选择投影时提供帮助。我们的方法、数据集、投影实现、指标、可视化和结果都是公开的，因此感兴趣的利益相关者可以检查和/或扩展这个基准。

相似文献

Toward a Quantitative Survey of Dimension Reduction Techniques.迈向降维技术的定量调查。

IEEE Trans Vis Comput Graph. 2021 Mar;27(3):2153-2173. doi: 10.1109/TVCG.2019.2944182. Epub 2021 Jan 28.

Probing Projections: Interaction Techniques for Interpreting Arrangements and Errors of Dimensionality Reductions.探测投影：解释降维排列和误差的交互技术。

IEEE Trans Vis Comput Graph. 2016 Jan;22(1):629-38. doi: 10.1109/TVCG.2015.2467717. Epub 2015 Aug 12.

Perception-Based Evaluation of Projection Methods for Multidimensional Data Visualization.基于感知的多维数据可视化投影方法评估

IEEE Trans Vis Comput Graph. 2015 Jan;21(1):81-94. doi: 10.1109/TVCG.2014.2330617.

A visual approach for analysis and inference of molecular activity spaces.一种用于分子活性空间分析与推断的可视化方法。

J Cheminform. 2019 Oct 22;11(1):63. doi: 10.1186/s13321-019-0386-z.

Quality metrics in high-dimensional data visualization: an overview and systematization.高维数据可视化中的质量度量：概述与系统化。

IEEE Trans Vis Comput Graph. 2011 Dec;17(12):2203-12. doi: 10.1109/TVCG.2011.229.

Implicit Multidimensional Projection of Local Subspaces.

IEEE Trans Vis Comput Graph. 2021 Feb;27(2):1558-1568. doi: 10.1109/TVCG.2020.3030368. Epub 2021 Jan 28.

Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality.用于基因组数据异常值检测的稳健子空间方法规避了维度诅咒。

R Soc Open Sci. 2020 Feb 5;7(2):190714. doi: 10.1098/rsos.190714. eCollection 2020 Feb.

UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data.

IEEE Trans Vis Comput Graph. 2023 Feb;29(2):1559-1572. doi: 10.1109/TVCG.2021.3125576. Epub 2022 Dec 29.

Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of Chou's PseAAC.通过半监督学习和 Chou 的 PseAAC 通用模式的降维方法预测单plex 和 multiplex 蛋白质的亚细胞定位。

IEEE Trans Nanobioscience. 2013 Dec;12(4):311-20. doi: 10.1109/TNB.2013.2272014.

Current Projection Methods-Induced Biases at Subgroup Detection for Machine-Learning Based Data-Analysis of Biomedical Data.当前基于机器学习的生物医学数据分析中的子群检测的预测方法——诱导偏差。

Int J Mol Sci. 2019 Dec 20;21(1):79. doi: 10.3390/ijms21010079.

引用本文的文献

Energy Landscapes and Structural Plasticity of Intrinsically Disordered Histones.内在无序组蛋白的能量景观与结构可塑性

J Chem Inf Model. 2025 Aug 25;65(16):8679-8687. doi: 10.1021/acs.jcim.4c02269. Epub 2025 Aug 6.

PLoS One. 2025 Apr 21;20(4):e0321114. doi: 10.1371/journal.pone.0321114. eCollection 2025.

Latent Structure in Ehr Data: Reconstruction of Diabetes Markers with Sparse NMF.Ehr数据中的潜在结构：使用稀疏非负矩阵分解重建糖尿病标志物

medRxiv. 2025 Apr 1:2025.03.31.25324972. doi: 10.1101/2025.03.31.25324972.

From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization.从高维到人类洞察：探索用于化学空间可视化的降维方法

Mol Inform. 2025 Jan;44(1):e202400265. doi: 10.1002/minf.202400265. Epub 2024 Dec 5.

Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection.基于多重对应分析和社区检测的精细分辨率单细胞 RNA-seq 数据特征选择方法

PLoS Comput Biol. 2024 Oct 28;20(10):e1012560. doi: 10.1371/journal.pcbi.1012560. eCollection 2024 Oct.

The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense.洞察房间里的大象的艺术：单细胞数据的二维嵌入确实有意义。

PLoS Comput Biol. 2024 Oct 2;20(10):e1012403. doi: 10.1371/journal.pcbi.1012403. eCollection 2024 Oct.

Calibrating dimension reduction hyperparameters in the presence of noise.在存在噪声的情况下校准降维超参数。

PLoS Comput Biol. 2024 Sep 12;20(9):e1012427. doi: 10.1371/journal.pcbi.1012427. eCollection 2024 Sep.

A General Framework for Comparing Embedding Visualizations Across Class-Label Hierarchies.跨类别标签层次结构比较嵌入可视化的通用框架。

IEEE Trans Vis Comput Graph. 2025 Jan;31(1):283-293. doi: 10.1109/TVCG.2024.3456370. Epub 2024 Dec 3.

TinyNS: Platform-Aware Neurosymbolic Auto Tiny Machine Learning.TinyNS：平台感知神经符号自动微型机器学习

ACM Trans Embed Comput Syst. 2024 May;23(3). doi: 10.1145/3603171. Epub 2024 May 11.

ParaDime: A Framework for Parametric Dimensionality Reduction.ParaDime：一种参数降维框架。

Comput Graph Forum. 2023 Jun;42(3):337-348. doi: 10.1111/cgf.14834. Epub 2023 Jun 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

迈向降维技术的定量调查。

Toward a Quantitative Survey of Dimension Reduction Techniques.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献