School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.
Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz106.
Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework.
Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells.
SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.
单细胞 RNA 测序(scRNA-seq)分析揭示了转录的显著变异性,表明许多基因在单细胞水平的表达本质上是随机和嘈杂的。然而,在细胞群体水平上,传统上被称为管家基因(HKG)的一组基因在不同的细胞和组织类型中被发现稳定表达。因此,关键是要质疑在单细胞水平上是否可以识别稳定表达的基因(SEGs),如果可以,如何评估它们的表达稳定性?我们之前提出了一个用于对 scRNA-seq 数据归一化和整合中单细胞基因表达稳定性进行排序的计算框架。在这项研究中,我们对该框架得出的 SEGs 进行了详细的评估和特征描述。
在这里,我们表明,从早期人类和小鼠发育 scRNA-seq 数据集和“Mouse Atlas”数据集得出的基因表达稳定性指数在不同物种之间是可重复和保守的。我们证明,基于其稳定性指数从单细胞中鉴定的 SEGs 比以前从不同生物系统的细胞群体中定义的 HKGs 稳定得多。我们的分析表明,SEGs 在单细胞水平上固有地更稳定,其特征类似于 HKGs,这表明它们在维持单个细胞中基本功能方面的潜在作用。
本研究中鉴定的 SEGs 不仅立即有助于理解单细胞转录组的变异性和稳定性,而且对于 scRNA-seq 数据归一化等实际应用也具有实用价值。我们计算基因稳定性指数的框架,“scSEGIndex”,已被纳入 scMerge Bioconductor R 包(https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html),可用于识别 scRNA-seq 数据中具有稳定表达的基因。