College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
Institute of Mathematics and Physics, Xinjiang University, Urumqi, China.
Interdiscip Sci. 2021 Mar;13(1):83-90. doi: 10.1007/s12539-020-00411-6. Epub 2021 Jan 21.
Clustering is a common method to identify cell types in single cell analysis, but the increasing size of scRNA-seq datasets brings challenges to single cell clustering. Therefore, it is an urgent need to design a faster and more accurate clustering method for large-scale scRNA-seq data. In this paper, we proposed a new method for single cell clustering. First, a count matrix is constructed through normalization and gene filtration. Second, the raw data of gene expression matrix are projected to feature space constructed by secondary construction of feature space based on UMAP (Uniform Manifold Approximation and Projection). Third, the low-dimensional matrix on the feature space is randomly divided into two sub-matrices according to a certain proportion for clustering and classifying, respectively. Finally, one subset is clustered by k-means algorithm and then the other subset is classified by k-nearest neighbor algorithm based on clustering results. Experimental results show that our method can cluster the scRNA-seq datasets effectively.
聚类是单细胞分析中识别细胞类型的常用方法,但随着 scRNA-seq 数据集规模的不断增加,单细胞聚类面临着挑战。因此,设计一种更快、更准确的大规模 scRNA-seq 数据聚类方法是当务之急。本文提出了一种新的单细胞聚类方法。首先,通过归一化和基因过滤构建计数矩阵。其次,基于 UMAP(Uniform Manifold Approximation and Projection)对基因表达矩阵的原始数据进行二次构造特征空间投影。然后,根据一定的比例,将低维特征空间矩阵随机划分为两个子矩阵,分别进行聚类和分类。最后,基于聚类结果,使用 k-means 算法对一个子集进行聚类,然后使用 k-最近邻算法对另一个子集进行分类。实验结果表明,我们的方法可以有效地对 scRNA-seq 数据集进行聚类。