Department of Physiology, Center of Systems Molecular Medicine, Medical College of Wisconsin, Milwaukee, WI, USA.
Department of Physiology, University of Arizona College of Medicine - Tucson, Tucson, AZ, USA.
BMC Genomics. 2023 Jul 3;24(1):371. doi: 10.1186/s12864-023-09487-y.
A common feature of single-cell RNA-seq (scRNA-seq) data is that the number of cells in a cell cluster may vary widely, ranging from a few dozen to several thousand. It is not clear whether scRNA-seq data from a small number of cells allow robust identification of differentially expressed genes (DEGs) with various characteristics.
We addressed this question by performing scRNA-seq and poly(A)-dependent bulk RNA-seq in comparable aliquots of human induced pluripotent stem cells-derived, purified vascular endothelial and smooth muscle cells. We found that scRNA-seq data needed to have 2,000 or more cells in a cluster to identify the majority of DEGs that would show modest differences in a bulk RNA-seq analysis. On the other hand, clusters with as few as 50-100 cells may be sufficient for identifying the majority of DEGs that would have extremely small p values or transcript abundance greater than a few hundred transcripts per million in a bulk RNA-seq analysis.
Findings of the current study provide a quantitative reference for designing studies that aim for identifying DEGs for specific cell clusters using scRNA-seq data and for interpreting results of such studies.
单细胞 RNA 测序 (scRNA-seq) 数据的一个共同特征是,细胞簇中的细胞数量可能差异很大,从几十个到几千个不等。目前尚不清楚少量细胞的 scRNA-seq 数据是否能够可靠地识别具有各种特征的差异表达基因 (DEGs)。
我们通过对人诱导多能干细胞衍生的、纯化的血管内皮和平滑肌细胞的可比等分试样进行 scRNA-seq 和 poly(A)-依赖性批量 RNA-seq,解决了这个问题。我们发现,一个聚类中需要有 2000 个或更多的细胞,才能识别出在批量 RNA-seq 分析中会显示出中等差异的大多数 DEGs。另一方面,聚类中只要有 50-100 个细胞,就可能足以识别出在批量 RNA-seq 分析中具有极小 p 值或转录本丰度大于几百万分之一的大多数 DEGs。
本研究的结果为设计旨在使用 scRNA-seq 数据识别特定细胞聚类的 DEGs 的研究提供了定量参考,并为解释此类研究的结果提供了参考。