IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):431-442. doi: 10.1109/TCBB.2019.2931582. Epub 2021 Apr 6.
Single-cell RNA sequencing (scRNA-seq) technology provides quantitative gene expression profiles at single-cell resolution. As a result, researchers have established new ways to explore cell population heterogeneity and genetic variability of cells. One of the current research directions for scRNA-seq data is to identify different cell types accurately through unsupervised clustering methods. However, scRNA-seq data analysis is challenging because of their high noise level, high dimensionality and sparsity. Moreover, the impact of multiple latent factors on gene expression heterogeneity and on the ability to accurately identify cell types remains unclear. How to overcome these challenges to reveal the biological difference between cell types has become the key to analyze scRNA-seq data. For these reasons, the unsupervised learning for cell population discovery based on scRNA-seq data analysis has become an important research area. A cell similarity assessment method plays a significant role in cell clustering. Here, we present BioRank, a new cell similarity assessment method based on annotated gene sets and gene ranks. To evaluate the performances, we cluster cells by two classical clustering algorithms based on the similarity between cells obtained by BioRank. In addition, BioRank can be used by any clustering algorithm that requires a similarity matrix. Applying BioRank to 12 public scRNA-seq datasets, we show that it is better than or at least as well as several popular similarity assessment methods for single cell clustering.
单细胞 RNA 测序 (scRNA-seq) 技术可提供单细胞分辨率的定量基因表达谱。因此,研究人员已经建立了新的方法来探索细胞群体异质性和细胞遗传变异性。目前 scRNA-seq 数据的一个研究方向是通过无监督聚类方法准确识别不同的细胞类型。然而,由于其高噪声水平、高维度和稀疏性,scRNA-seq 数据分析具有挑战性。此外,多个潜在因素对基因表达异质性和准确识别细胞类型的能力的影响尚不清楚。如何克服这些挑战以揭示细胞类型之间的生物学差异已成为分析 scRNA-seq 数据的关键。出于这些原因,基于 scRNA-seq 数据分析的细胞群体发现的无监督学习已成为一个重要的研究领域。细胞相似性评估方法在细胞聚类中起着重要作用。在这里,我们提出了一种新的基于注释基因集和基因排序的细胞相似性评估方法 BioRank。为了评估性能,我们根据 BioRank 获得的细胞之间的相似性,使用两种经典聚类算法对细胞进行聚类。此外,BioRank 可用于任何需要相似性矩阵的聚类算法。将 BioRank 应用于 12 个公共 scRNA-seq 数据集,我们表明它在单细胞聚类方面优于或至少与几种流行的相似性评估方法一样好。