Graffelman Jan
Department of Statistics and Operations Research, Universitat Politècnica de Catalunya.
Department of Biostatistics, University of Washington.
J Appl Stat. 2020;47(11):2011-2024. doi: 10.1080/02664763.2019.1702929. Epub 2019 Dec 17.
Metric multidimensional scaling (MDS) is a widely used multivariate method with applications in almost all scientific disciplines. Eigenvalues obtained in the analysis are usually reported in order to calculate the overall goodness-of-fit of the distance matrix. In this paper, we refine MDS goodness-of-fit calculations, proposing additional point and pairwise goodness-of-fit statistics that can be used to filter poorly represented observations in MDS maps. The proposed statistics are especially relevant for large data sets that contain outliers, with typically many poorly fitted observations, and are helpful for improving MDS output and emphasising the most important features of the dataset. Several goodness-of-fit statistics are considered, and both Euclidean and non-Euclidean distance matrices are considered. Some examples with data from demographic, genetic and geographic studies are shown.
度量多维尺度分析(MDS)是一种广泛应用的多元方法,几乎在所有科学学科中都有应用。分析中获得的特征值通常会被报告出来,以便计算距离矩阵的整体拟合优度。在本文中,我们改进了MDS拟合优度的计算方法,提出了额外的点拟合优度和成对拟合优度统计量,可用于筛选MDS图中表示不佳的观测值。所提出的统计量对于包含异常值的大数据集尤其相关,这些数据集通常有许多拟合不佳的观测值,有助于改进MDS输出并突出数据集最重要的特征。我们考虑了几种拟合优度统计量,同时考虑了欧几里得距离矩阵和非欧几里得距离矩阵。文中展示了一些来自人口统计学、遗传学和地理学研究数据的示例。