School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
Department of Radiology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Zhongshan Road, Guangzhou, 510080, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa433.
Single-cell RNA-sequencing (scRNA-seq) explores the transcriptome of genes at cell level, which sheds light on revealing the heterogeneity and dynamics of cell populations. Advances in biotechnologies make it possible to generate scRNA-seq profiles for large-scale cells, requiring effective and efficient clustering algorithms to identify cell types and informative genes. Although great efforts have been devoted to clustering of scRNA-seq, the accuracy, scalability and interpretability of available algorithms are not desirable. In this study, we solve these problems by developing a joint learning algorithm [a.k.a. joints sparse representation and clustering (jSRC)], where the dimension reduction (DR) and clustering are integrated. Specifically, DR is employed for the scalability and joint learning improves accuracy. To increase the interpretability of patterns, we assume that cells within the same type have similar expression patterns, where the sparse representation is imposed on features. We transform clustering of scRNA-seq into an optimization problem and then derive the update rules to optimize the objective of jSRC. Fifteen scRNA-seq datasets from various tissues and organisms are adopted to validate the performance of jSRC, where the number of single cells varies from 49 to 110 824. The experimental results demonstrate that jSRC significantly outperforms 12 state-of-the-art methods in terms of various measurements (on average 20.29% by improvement) with fewer running time. Furthermore, jSRC is efficient and robust across different scRNA-seq datasets from various tissues. Finally, jSRC also accurately identifies dynamic cell types associated with progression of COVID-19. The proposed model and methods provide an effective strategy to analyze scRNA-seq data (the software is coded using MATLAB and is free for academic purposes; https://github.com/xkmaxidian/jSRC).
单细胞 RNA 测序 (scRNA-seq) 在细胞水平上探索基因的转录组,这有助于揭示细胞群体的异质性和动态性。生物技术的进步使得对大规模细胞进行 scRNA-seq 分析成为可能,这需要有效的聚类算法来识别细胞类型和有信息的基因。尽管人们已经在 scRNA-seq 的聚类方面做出了巨大努力,但现有算法的准确性、可扩展性和可解释性并不理想。在这项研究中,我们通过开发联合稀疏表示和聚类 (jSRC) 联合学习算法来解决这些问题,其中包括降维和聚类的集成。具体来说,降维用于提高可扩展性,联合学习用于提高准确性。为了增加模式的可解释性,我们假设同一类型的细胞具有相似的表达模式,其中稀疏表示被施加到特征上。我们将 scRNA-seq 的聚类转化为一个优化问题,然后推导出优化 jSRC 目标的更新规则。我们采用了来自不同组织和生物体的 15 个 scRNA-seq 数据集来验证 jSRC 的性能,其中单细胞的数量从 49 到 110824 不等。实验结果表明,jSRC 在各种度量标准上(平均提高了 20.29%)明显优于 12 种最先进的方法,同时运行时间更少。此外,jSRC 在来自不同组织的不同 scRNA-seq 数据集上具有高效性和鲁棒性。最后,jSRC 还准确地识别了与 COVID-19 进展相关的动态细胞类型。该模型和方法为分析 scRNA-seq 数据提供了一种有效的策略(该软件使用 MATLAB 编写,可免费用于学术目的;https://github.com/xkmaxidian/jSRC)。