Department of Computer Science, Duke University, Durham, NC, 27708, USA.
Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
Commun Biol. 2022 Jul 19;5(1):719. doi: 10.1038/s42003-022-03628-x.
Dimension reduction (DR) algorithms project data from high dimensions to lower dimensions to enable visualization of interesting high-dimensional structure. DR algorithms are widely used for analysis of single-cell transcriptomic data. Despite widespread use of DR algorithms such as t-SNE and UMAP, these algorithms have characteristics that lead to lack of trust: they do not preserve important aspects of high-dimensional structure and are sensitive to arbitrary user choices. Given the importance of gaining insights from DR, DR methods should be evaluated carefully before trusting their results. In this paper, we introduce and perform a systematic evaluation of popular DR methods, including t-SNE, art-SNE, UMAP, PaCMAP, TriMap and ForceAtlas2. Our evaluation considers five components: preservation of local structure, preservation of global structure, sensitivity to parameter choices, sensitivity to preprocessing choices, and computational efficiency. This evaluation can help us to choose DR tools that align with the scientific goals of the user.
降维(DR)算法将数据从高维投影到低维,以实现对有趣的高维结构的可视化。DR 算法广泛应用于单细胞转录组数据分析。尽管 t-SNE 和 UMAP 等 DR 算法得到了广泛的应用,但这些算法存在一些特点,导致人们对其结果缺乏信任:它们不能保留高维结构的重要方面,并且对任意用户选择敏感。鉴于从 DR 中获得洞察力的重要性,在信任其结果之前,应仔细评估 DR 方法。在本文中,我们介绍并对流行的 DR 方法进行了系统评估,包括 t-SNE、art-SNE、UMAP、PaCMAP、TriMap 和 ForceAtlas2。我们的评估考虑了五个方面:局部结构的保留、全局结构的保留、对参数选择的敏感性、对预处理选择的敏感性和计算效率。这种评估可以帮助我们选择与用户的科学目标相匹配的 DR 工具。