Xia Lucy, Lee Christy, Li Jingyi Jessica
bioRxiv. 2023 Sep 15:2023.04.21.537839. doi: 10.1101/2023.04.21.537839.
Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-SNE and UMAP are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP's 2D embedding might not reliably inform the similarities among cell clusters. Motivated by this challenge, we developed a statistical method, scDEED, for detecting dubious cell embeddings output by any 2D-embedding method. By calculating a reliability score for every cell embedding, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. Applied to multiple scRNA-seq datasets, scDEED demonstrates its effectiveness for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.
二维(2D)嵌入方法对于单细胞数据可视化至关重要。诸如t-SNE和UMAP等流行方法通常用于可视化细胞簇;然而,众所周知,t-SNE和UMAP的二维嵌入可能无法可靠地反映细胞簇之间的相似性。受此挑战的启发,我们开发了一种统计方法scDEED,用于检测任何二维嵌入方法输出的可疑细胞嵌入。通过计算每个细胞嵌入的可靠性得分,scDEED将可靠性得分低的细胞嵌入识别为可疑嵌入,将可靠性得分高的细胞嵌入识别为可信嵌入。此外,通过最小化可疑细胞嵌入的数量,scDEED为优化嵌入方法的超参数提供了直观的指导。应用于多个scRNA-seq数据集时,scDEED证明了其在检测可疑细胞嵌入以及优化t-SNE和UMAP超参数方面的有效性。