School of Mathematics and Statistics, Shandong University, Weihai 264209, Shandong, China.
College of Mathematics and Informatics, South China Agricultural University, Guangzhou, Guangdong, China.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae273.
Cluster analysis, a pivotal step in single-cell sequencing data analysis, presents substantial opportunities to effectively unveil the molecular mechanisms underlying cellular heterogeneity and intercellular phenotypic variations. However, the inherent imperfections arise as different clustering algorithms yield diverse estimates of cluster numbers and cluster assignments. This study introduces Single Cell Consistent Clustering based on Spectral Matrix Decomposition (SCSMD), a comprehensive clustering approach that integrates the strengths of multiple methods to determine the optimal clustering scheme. Testing the performance of SCSMD across different distances and employing the bespoke evaluation metric, the methodological selection undergoes validation to ensure the optimal efficacy of the SCSMD. A consistent clustering test is conducted on 15 authentic scRNA-seq datasets. The application of SCSMD to human embryonic stem cell scRNA-seq data successfully identifies known cell types and delineates their developmental trajectories. Similarly, when applied to glioblastoma cells, SCSMD accurately detects pre-existing cell types and provides finer sub-division within one of the original clusters. The results affirm the robust performance of our SCSMD method in terms of both the number of clusters and cluster assignments. Moreover, we have broadened the application scope of SCSMD to encompass larger datasets, thereby furnishing additional evidence of its superiority. The findings suggest that SCSMD is poised for application to additional scRNA-seq datasets and for further downstream analyses.
聚类分析是单细胞测序数据分析中的关键步骤,它为有效揭示细胞异质性和细胞间表型变异的分子机制提供了重要机会。然而,由于不同的聚类算法对聚类数量和聚类分配的估计不同,因此存在内在的不完美。本研究提出了基于谱矩阵分解的单细胞一致性聚类(SCSMD),这是一种综合聚类方法,它整合了多种方法的优势,以确定最佳的聚类方案。我们通过在不同距离上测试 SCSMD 的性能,并采用专门的评估指标,对方法选择进行验证,以确保 SCSMD 的最佳效果。我们对 15 个真实的 scRNA-seq 数据集进行了一致聚类测试。将 SCSMD 应用于人类胚胎干细胞 scRNA-seq 数据成功地识别了已知的细胞类型,并描绘了它们的发育轨迹。同样,当应用于神经胶质瘤细胞时,SCSMD 可以准确地检测到预先存在的细胞类型,并在其中一个原始聚类中提供更精细的细分。结果证实了我们的 SCSMD 方法在聚类数量和聚类分配方面的稳健性能。此外,我们还拓宽了 SCSMD 的应用范围,使其能够涵盖更大的数据集,从而进一步证明了它的优越性。这些发现表明,SCSMD 有望应用于更多的 scRNA-seq 数据集,并进行进一步的下游分析。