Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai 200127, China.
Nucleic Acids Res. 2021 Feb 22;49(3):e18. doi: 10.1093/nar/gkaa1157.
Single-cell RNA sequencing enables us to characterize the cellular heterogeneity in single cell resolution with the help of cell type identification algorithms. However, the noise inherent in single-cell RNA-sequencing data severely disturbs the accuracy of cell clustering, marker identification and visualization. We propose that clustering based on feature density profiles can distinguish informative features from noise. We named such strategy as 'entropy subspace' separation and designed a cell clustering algorithm called ENtropy subspace separation-based Clustering for nOise REduction (ENCORE) by integrating the 'entropy subspace' separation strategy with a consensus clustering method. We demonstrate that ENCORE performs superiorly on cell clustering and generates high-resolution visualization across 12 standard datasets. More importantly, ENCORE enables identification of group markers with biological significance from a hard-to-separate dataset. With the advantages of effective feature selection, improved clustering, accurate marker identification and high-resolution visualization, we present ENCORE to the community as an important tool for scRNA-seq data analysis to study cellular heterogeneity and discover group markers.
单细胞 RNA 测序使我们能够借助细胞类型识别算法,以单细胞分辨率来描述细胞异质性。然而,单细胞 RNA 测序数据固有的噪声严重干扰了细胞聚类、标记识别和可视化的准确性。我们提出,基于特征密度分布的聚类可以将信息特征与噪声区分开来。我们将这种策略命名为“熵子空间”分离,并通过将“熵子空间”分离策略与共识聚类方法相结合,设计了一种名为基于熵子空间分离的聚类算法来降低噪声(ENCORE)。我们证明了 ENCORE 在细胞聚类方面表现出色,并在 12 个标准数据集上生成了高分辨率的可视化结果。更重要的是,ENCORE 能够从难以分离的数据集识别出具有生物学意义的组标记。ENCORE 具有有效的特征选择、改进的聚类、准确的标记识别和高分辨率可视化等优势,我们将其作为单细胞 RNA 测序数据分析的重要工具,用于研究细胞异质性和发现组标记。