Graduate Programs in Molecular Biosciences, Rutgers, The State University of New Jersey, 604 Allison Rd, Piscataway, 08854, NJ, USA.
Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, 08854, NJ, USA.
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad294.
The advent of single-cell RNA sequencing (scRNA-seq) technologies has enabled gene expression profiling at the single-cell resolution, thereby enabling the quantification and comparison of transcriptional variability among individual cells. Although alterations in transcriptional variability have been observed in various biological states, statistical methods for quantifying and testing differential variability between groups of cells are still lacking. To identify the best practices in differential variability analysis of single-cell gene expression data, we propose and compare 12 statistical pipelines using different combinations of methods for normalization, feature selection, dimensionality reduction and variability calculation. Using high-quality synthetic scRNA-seq datasets, we benchmarked the proposed pipelines and found that the most powerful and accurate pipeline performs simple library size normalization, retains all genes in analysis and uses denSNE-based distances to cluster medoids as the variability measure. By applying this pipeline to scRNA-seq datasets of COVID-19 and autism patients, we have identified cellular variability changes between patients with different severity status or between patients and healthy controls.
单细胞 RNA 测序 (scRNA-seq) 技术的出现使我们能够在单细胞分辨率下进行基因表达谱分析,从而能够定量和比较单个细胞之间的转录变异性。尽管在各种生物状态下都观察到转录变异性的改变,但用于量化和测试细胞群之间差异变异性的统计方法仍然缺乏。为了确定单细胞基因表达数据差异变异性分析的最佳实践,我们提出并比较了 12 个统计分析流程,这些流程使用了不同的归一化、特征选择、降维和变异性计算方法组合。使用高质量的合成 scRNA-seq 数据集,我们对所提出的分析流程进行了基准测试,结果发现最强大和准确的分析流程是对简单的文库大小进行归一化,在分析中保留所有基因,并使用基于 denSNE 的距离来聚类中位数作为变异性度量。通过将此分析流程应用于 COVID-19 和自闭症患者的 scRNA-seq 数据集,我们已经确定了不同严重程度患者之间或患者与健康对照者之间的细胞变异性变化。