Immunology Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK.
VIB Center for Brain and Disease Research, 3000 Leuven, Belgium.
Cell Rep Methods. 2023 Jan 13;3(1):100390. doi: 10.1016/j.crmeth.2022.100390. eCollection 2023 Jan 23.
The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization.
高维单细胞数据的出现使得降维工具的发展成为必要。 t 分布随机邻域嵌入(t-SNE)和一致流形逼近和投影(UMAP)是两种最常用的方法,允许对复杂的单细胞数据集进行清晰的可视化。尽管需要进行定量比较,但由于缺乏稳健的统计方法,t-SNE 和 UMAP 在很大程度上仍然是可视化工具。在这里,我们使用每个数据集中单细胞的交叉熵分布的柯尔莫哥洛夫-斯米尔诺夫检验,推导出了一种用于评估降维后数据集之间差异的统计检验方法。由于该方法使用单细胞的相互关系进行比较,因此所得统计量是稳健的,并且能够识别真正的生物学变异。此外,该检验提供了单细胞数据集之间的有效距离,允许将多个样本组织成一个 dendrogram,以便对复杂数据集进行定量比较。这些结果表明,除了可视化之外,降维工具在生物医学数据分析方面还有很大的潜力尚未被挖掘。