Department of ISOM, School of Business and Management, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.
Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA.
Nat Commun. 2024 Feb 26;15(1):1753. doi: 10.1038/s41467-024-45891-y.
Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP's 2D embeddings might not reliably inform the similarities among cell clusters. Motivated by this challenge, we present a statistical method, scDEED, for detecting dubious cell embeddings output by a 2D-embedding method. By calculating a reliability score for every cell embedding based on the similarity between the cell's 2D-embedding neighbors and pre-embedding neighbors, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. We show the effectiveness of scDEED on multiple datasets for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.
二维(2D)嵌入方法对于单细胞数据可视化至关重要。流行的方法,如 t 分布随机邻居嵌入(t-SNE)和一致流形逼近和投影(UMAP),通常用于可视化细胞簇;然而,众所周知,t-SNE 和 UMAP 的 2D 嵌入可能无法可靠地反映细胞簇之间的相似性。受此挑战的启发,我们提出了一种统计方法 scDEED,用于检测二维嵌入方法输出的可疑细胞嵌入。通过根据细胞的 2D 嵌入邻居和预嵌入邻居之间的相似性为每个细胞嵌入计算可靠性得分,scDEED 将低可靠性得分的细胞嵌入识别为可疑的,而将高可靠性得分的细胞嵌入识别为可信的。此外,通过最小化可疑细胞嵌入的数量,scDEED 为优化嵌入方法的超参数提供了直观的指导。我们在多个数据集上展示了 scDEED 检测可疑细胞嵌入和优化 t-SNE 和 UMAP 超参数的有效性。