Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, United States of America.
PLoS Comput Biol. 2021 Mar 16;17(3):e1008804. doi: 10.1371/journal.pcbi.1008804. eCollection 2021 Mar.
With the rapid advances of various single-cell technologies, an increasing number of single-cell datasets are being generated, and the computational tools for aligning the datasets which make subsequent integration or meta-analysis possible have become critical. Typically, single-cell datasets from different technologies cannot be directly combined or concatenated, due to the innate difference in the data, such as the number of measured parameters and the distributions. Even datasets generated by the same technology are often affected by the batch effect. A computational approach for aligning different datasets and hence identifying related clusters will be useful for data integration and interpretation in large scale single-cell experiments. Our proposed algorithm called JSOM, a variation of the Self-organizing map, aligns two related datasets that contain similar clusters, by constructing two maps-low-dimensional discretized representation of datasets-that jointly evolve according to both datasets. Here we applied the JSOM algorithm to flow cytometry, mass cytometry, and single-cell RNA sequencing datasets. The resulting JSOM maps not only align the related clusters in the two datasets but also preserve the topology of the datasets so that the maps could be used for further analysis, such as clustering.
随着各种单细胞技术的快速发展,越来越多的单细胞数据集正在生成,用于对齐数据集的计算工具变得至关重要,这些数据集使后续的集成或元分析成为可能。通常,由于数据的内在差异,如测量参数的数量和分布,不同技术的单细胞数据集不能直接组合或串联。即使是由同一技术生成的数据集,通常也会受到批次效应的影响。对齐不同数据集并识别相关簇的计算方法对于大规模单细胞实验中的数据集成和解释将非常有用。我们提出的算法称为 JSOM,是自组织图的一种变体,通过构建两个共同根据两个数据集演变的图谱(数据集的低维离散表示),对齐包含相似簇的两个相关数据集。在这里,我们将 JSOM 算法应用于流式细胞术、质谱流式细胞术和单细胞 RNA 测序数据集。生成的 JSOM 图谱不仅对齐了两个数据集的相关簇,而且还保留了数据集的拓扑结构,以便可以对图谱进行进一步分析,如聚类。