Mei Qinglin, Li Guojun, Su Zhengchang
Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Jinan 250100, China.
School of Mathematics, Shandong University, Jinan 250100, China.
Bioinformatics. 2021 Oct 11;37(19):3235-3242. doi: 10.1093/bioinformatics/btab276.
Recent breakthroughs of single-cell RNA sequencing (scRNA-seq) technologies offer an exciting opportunity to identify heterogeneous cell types in complex tissues. However, the unavoidable biological noise and technical artifacts in scRNA-seq data as well as the high dimensionality of expression vectors make the problem highly challenging. Consequently, although numerous tools have been developed, their accuracy remains to be improved.
Here, we introduce a novel clustering algorithm and tool RCSL (Rank Constrained Similarity Learning) to accurately identify various cell types using scRNA-seq data from a complex tissue. RCSL considers both local similarity and global similarity among the cells to discern the subtle differences among cells of the same type as well as larger differences among cells of different types. RCSL uses Spearman's rank correlations of a cell's expression vector with those of other cells to measure its global similarity, and adaptively learns neighbor representation of a cell as its local similarity. The overall similarity of a cell to other cells is a linear combination of its global similarity and local similarity. RCSL automatically estimates the number of cell types defined in the similarity matrix, and identifies them by constructing a block-diagonal matrix, such that its distance to the similarity matrix is minimized. Each block-diagonal submatrix is a cell cluster/type, corresponding to a connected component in the cognate similarity graph. When tested on 16 benchmark scRNA-seq datasets in which the cell types are well-annotated, RCSL substantially outperformed six state-of-the-art methods in accuracy and robustness as measured by three metrics.
The RCSL algorithm is implemented in R and can be freely downloaded at https://cran.r-project.org/web/packages/RCSL/index.html.
Supplementary data are available at Bioinformatics online.
单细胞RNA测序(scRNA-seq)技术的最新突破为识别复杂组织中的异质细胞类型提供了一个令人兴奋的机会。然而,scRNA-seq数据中不可避免的生物学噪声和技术假象,以及表达载体的高维度,使得这个问题极具挑战性。因此,尽管已经开发了许多工具,但其准确性仍有待提高。
在这里,我们引入了一种新颖的聚类算法和工具RCSL(秩约束相似性学习),以使用来自复杂组织的scRNA-seq数据准确识别各种细胞类型。RCSL考虑细胞之间的局部相似性和全局相似性,以辨别同一类型细胞之间的细微差异以及不同类型细胞之间的较大差异。RCSL使用一个细胞的表达载体与其他细胞的表达载体的斯皮尔曼秩相关性来衡量其全局相似性,并自适应地学习一个细胞的邻居表示作为其局部相似性。一个细胞与其他细胞的总体相似性是其全局相似性和局部相似性的线性组合。RCSL自动估计相似性矩阵中定义的细胞类型数量,并通过构建一个块对角矩阵来识别它们,使得其与相似性矩阵的距离最小化。每个块对角子矩阵是一个细胞簇/类型,对应于同源相似性图中的一个连通分量。在16个细胞类型已得到充分注释的基准scRNA-seq数据集上进行测试时,通过三个指标衡量,RCSL在准确性和鲁棒性方面大大优于六种最先进的方法。
RCSL算法用R实现,可以从https://cran.r-project.org/web/packages/RCSL/index.html免费下载。
补充数据可在《生物信息学》在线获取。