Kim Hyun, Park Issac, Park Jong-Eun, Kim Jong Kyoung, Seo Minseok, Kim Jae Kyoung
Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, Republic of Korea.
Department of Mathematics, Pusan National University, Busan, Republic of Korea.
Nat Commun. 2025 Jul 2;16(1):6031. doi: 10.1038/s41467-025-60702-8.
Clustering analysis is a fundamental step in scRNA-seq data analysis. However, its reliability is compromised by clustering inconsistency among trials due to stochastic processes in clustering algorithms. Despite efforts to obtain reliable and consensus clustering, existing methods cannot be applied to large scRNA-seq datasets due to high computational costs. Here, we develop the single-cell Inconsistency Clustering Estimator (scICE) to evaluate clustering consistency and provide consistent clustering results, achieving up to a 30-fold improvement in speed compared to conventional consensus clustering-based methods, such as multiK and chooseR. Application of scICE to 48 real and simulated scRNA-seq datasets, some with over 10,000 cells, successfully identifies all consistent clustering results, substantially narrowing the number of clusters to explore. By enabling the focus on a narrower set of more reliable candidate clusters, users can greatly reduce computational burden while generating more robust results.
聚类分析是单细胞RNA测序(scRNA-seq)数据分析的基本步骤。然而,由于聚类算法中的随机过程,不同试验之间的聚类不一致性降低了其可靠性。尽管人们努力获得可靠且一致的聚类结果,但由于计算成本高,现有方法无法应用于大型scRNA-seq数据集。在此,我们开发了单细胞不一致聚类估计器(scICE)来评估聚类一致性并提供一致的聚类结果,与基于传统共识聚类的方法(如multiK和chooseR)相比,速度提高了30倍。将scICE应用于48个真实和模拟的scRNA-seq数据集(其中一些数据集包含超过10,000个细胞),成功识别出所有一致的聚类结果,大幅缩小了需要探索的聚类数量。通过聚焦于一组更窄的、更可靠的候选聚类,用户在生成更稳健结果的同时,可以大大减轻计算负担。