Thrun Michael C, Ultsch Alfred
Dept. of Hematology, Oncology and Immunology, Philipps-University of Marburg, Baldingerstraße, D-35043 Marburg.
Databionics Research Group, Philipps-University of Marburg, Hans-Meerwein-Straße 6, Marburg D-35032, Germany.
MethodsX. 2020 Oct 10;7:101093. doi: 10.1016/j.mex.2020.101093. eCollection 2020.
Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to visualize the relative relationships between high-dimensional data points that build up distance and density-based structures. However, the Johnson-Lindenstrauss lemma states that the two-dimensional similarities in the scatter plot cannot coercively represent high-dimensional structures. Here, a simplified emergent self-organizing map uses the projected points of such a scatter plot in combination with the dataset in order to compute the generalized U-matrix. The generalized U-matrix defines the visualization of a topographic map depicting the misrepresentations of projected points with regards to a given dimensionality reduction method and the dataset.•The topographic map provides accurate information about the high-dimensional distance and density based structures of high-dimensional data if an appropriate dimensionality reduction method is selected.•The topographic map can uncover the absence of distance-based structures.•The topographic map reveals the number of clusters in a dataset as the number of valleys.
投影是用于信息可视化的传统降维方法,用于将高维数据转换到低维空间。如果投影方法将输出空间限制为二维,结果就是散点图。这个散点图的目标是可视化构成基于距离和密度结构的高维数据点之间的相对关系。然而,约翰逊 - 林登施特劳斯引理表明,散点图中的二维相似性无法强制表示高维结构。在此,一种简化的自组织映射方法将这种散点图的投影点与数据集结合起来,以计算广义U矩阵。广义U矩阵定义了一个地形图的可视化,该地形图描绘了关于给定降维方法和数据集的投影点的失真情况。
如果选择了合适的降维方法,地形图会提供有关高维数据基于距离和密度的结构的准确信息。
地形图可以揭示基于距离的结构的缺失。
地形图将数据集中的聚类数量显示为山谷的数量。