Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, Quebec, H4A 3J1, Canada.
Genome Res. 2022 Sep 27;32(9):1765-1775. doi: 10.1101/gr.276609.122.
One of the first steps in the analysis of single-cell RNA sequencing (scRNA-seq) data is the assignment of cell types. Although a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments, we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both low-dimensional representation for all genes and cell-specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-seq data sets from several different organs. We show, by using knowledge about gene sets, that UNIFAN greatly outperforms prior methods developed for clustering scRNA-seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster, making annotations easier.
单细胞 RNA 测序 (scRNA-seq) 数据分析的第一步是细胞类型的分配。尽管已经开发了许多有监督的方法,但在大多数情况下,这种分配是通过首先在低维空间中对细胞进行聚类,然后将细胞类型分配到不同的聚类中来完成的。为了克服噪声并提高细胞类型的分配,我们开发了 UNIFAN,这是一种神经网络方法,使用已知的基因集同时对细胞进行聚类和注释。UNIFAN 结合了所有基因的低维表示和细胞特异性基因集活性分数,以确定聚类。我们将 UNIFAN 应用于来自几个不同器官的人类和小鼠 scRNA-seq 数据集。我们通过使用关于基因集的知识表明,UNIFAN 大大优于为聚类 scRNA-seq 数据开发的先前方法。UNIFAN 分配给不同聚类的基因集为该聚类所代表的细胞类型提供了强有力的证据,使得注释更加容易。