School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
School of Mathematical Sciences, Inner Mongolia University, Hohhot 010021, China.
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae480.
Unsupervised clustering of single-cell RNA sequencing (scRNA-seq) data holds the promise of characterizing known and novel cell type in various biological and clinical contexts. However, intrinsic multi-scale clustering resolutions poses challenges to deal with multiple sources of variability in the high-dimensional and noisy data.
We present ClusterMatch, a stable match optimization model to align scRNA-seq data at the cluster level. In one hand, ClusterMatch leverages the mutual correspondence by canonical correlation analysis and multi-scale Louvain clustering algorithms to identify cluster with optimized resolutions. In the other hand, it utilizes stable matching framework to align scRNA-seq data in the latent space while maintaining interpretability with overlapped marker gene set. Through extensive experiments, we demonstrate the efficacy of ClusterMatch in data integration, cell type annotation, and cross-species/timepoint alignment scenarios. Our results show ClusterMatch's ability to utilize both global and local information of scRNA-seq data, sets the appropriate resolution of multi-scale clustering, and offers interpretability by utilizing marker genes.
The code of ClusterMatch software is freely available at https://github.com/AMSSwanglab/ClusterMatch.
无监督的单细胞 RNA 测序 (scRNA-seq) 数据聚类有望在各种生物和临床环境中对已知和新型细胞类型进行特征描述。然而,内在的多尺度聚类分辨率给处理高维噪声数据中多种来源的可变性带来了挑战。
我们提出了 ClusterMatch,这是一种用于在簇水平上对齐 scRNA-seq 数据的稳定匹配优化模型。一方面,ClusterMatch 通过规范相关分析和多尺度 Louvain 聚类算法利用相互对应关系来识别具有优化分辨率的簇。另一方面,它利用稳定匹配框架在潜在空间中对齐 scRNA-seq 数据,同时保持重叠标记基因集的可解释性。通过广泛的实验,我们证明了 ClusterMatch 在数据集成、细胞类型注释和跨物种/时间点对齐场景中的有效性。我们的结果表明,ClusterMatch 能够利用 scRNA-seq 数据的全局和局部信息,设置适当的多尺度聚类分辨率,并通过利用标记基因提供可解释性。
ClusterMatch 软件的代码可在 https://github.com/AMSSwanglab/ClusterMatch 上免费获取。