Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA.
Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, NJ 08854, USA.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab295.
Single-cell RNA sequencing (scRNA-seq) technologies facilitate the characterization of transcriptomic landscapes in diverse species, tissues, and cell types with unprecedented molecular resolution. In order to evaluate various biological hypotheses using high-dimensional single-cell gene expression data, most computational and statistical methods depend on a gene feature selection step to identify genes with high biological variability and reduce computational complexity. Even though many gene selection methods have been developed for scRNA-seq analysis, there lacks a systematic comparison of the assumptions, statistical models, and selection criteria used by these methods. In this article, we summarize and discuss 17 computational methods for selecting gene features in unsupervised analysis of single-cell gene expression data, with unified notations and statistical frameworks. Our discussion provides a useful summary to help practitioners select appropriate methods based on their assumptions and applicability, and to assist method developers in designing new computational tools for unsupervised learning of scRNA-seq data.
单细胞 RNA 测序 (scRNA-seq) 技术以空前的分子分辨率促进了对不同物种、组织和细胞类型中转录组图谱的特征描述。为了使用高维单细胞基因表达数据评估各种生物学假设,大多数计算和统计方法都依赖于基因特征选择步骤,以识别具有高生物学变异性的基因并降低计算复杂性。尽管已经开发了许多用于 scRNA-seq 分析的基因选择方法,但这些方法所使用的假设、统计模型和选择标准缺乏系统的比较。在本文中,我们总结和讨论了 17 种用于无监督分析单细胞基因表达数据中基因特征选择的计算方法,采用了统一的符号和统计框架。我们的讨论提供了一个有用的总结,以帮助从业者根据其假设和适用性选择合适的方法,并帮助方法开发人员设计用于 scRNA-seq 数据无监督学习的新计算工具。