IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2249-2260. doi: 10.1109/TCBB.2020.2979717. Epub 2021 Dec 8.
The advent of single-cell RNA sequencing (scRNA-seq) techniques opens up new opportunities for studying the cell-specific changes in the transcriptomic data. An important research problem related with scRNA-seq data analysis is to identify cell subpopulations with distinct functions. However, the expression profiles of individual cells are usually measured over tens of thousands of genes, and it remains a difficult problem to effectively cluster the cells based on the high-dimensional profiles. An additional challenge of performing the analysis is that, the scRNA-seq data are often noisy and sometimes extremely sparse due to technical limitations and sampling deficiencies. In this paper, we propose a biclustering-based framework called DivBiclust that effectively identifies the cell subpopulations based on the high-dimensional noisy scRNA-seq data. Compared with nine state-of-the-art methods, DivBiclust excels in identifying cell subpopulations with high accuracy as evidenced by our experiments on ten real scRNA-seq datasets with different size and diverse dropout rates. The supplemental materials of DivBiclust, including the source codes, data, and a supplementary document, are available at https://www.github.com/Qiong-Fang/DivBiclust.
单细胞 RNA 测序 (scRNA-seq) 技术的出现为研究转录组数据中细胞特异性变化开辟了新的机会。与 scRNA-seq 数据分析相关的一个重要研究问题是识别具有不同功能的细胞亚群。然而,单个细胞的表达谱通常在数万基因上进行测量,基于高维谱有效地对细胞进行聚类仍然是一个难题。执行分析的另一个挑战是,由于技术限制和采样不足,scRNA-seq 数据通常存在噪声,有时甚至非常稀疏。在本文中,我们提出了一种基于双聚类的框架,称为 DivBiclust,它可以有效地基于高维噪声 scRNA-seq 数据识别细胞亚群。通过在十个具有不同大小和不同缺失率的真实 scRNA-seq 数据集上进行实验,证明了 DivBiclust 能够以高精度识别细胞亚群,优于九种最先进的方法。DivBiclust 的补充材料,包括源代码、数据和补充文件,可在 https://www.github.com/Qiong-Fang/DivBiclust 上获得。