Chen Minhui, Dahl Andy
Section of Genetic Medicine, University of Chicago, Chicago, IL 60637.
bioRxiv. 2023 Feb 27:2023.02.24.529987. doi: 10.1101/2023.02.24.529987.
The development of single-cell RNA sequencing (scRNA-seq) offers opportunities to characterize cellular heterogeneity at unprecedented resolution. Although scRNA-seq has been widely used to identify and characterize gene expression variation across cell types and cell states based on their average gene expression profiles, most studies ignore variation across individual donors. Modelling this inter-individual variation could improve statistical power to detect cell type-specific biology and inform the genes and cell types that underlying complex traits. We therefore develop a new model to detect and quantify cell type-specific variation across individuals called CTMM (Cell Type-specific linear Mixed Model). CTMM operates on cell type-specific pseudobulk expression and is fit with efficient methods that scale to hundreds of samples. We use extensive simulations to show that CTMM is powerful and unbiased in realistic settings. We also derive calibrated tests for cell type-specific interindividual variation, which is challenging given the modest sample sizes in scRNA-seq data. We apply CTMM to scRNA-seq data from human induced pluripotent stem cells to characterize the transcriptomic variation across donors as cells differentiate into endoderm. We find that almost 100% of transcriptome-wide variability between donors is differentiation stage-specific. CTMM also identifies individual genes with statistically significant stage-specific variability across samples, including 61 genes that do not have significant stage-specific mean expression. Finally, we extend CTMM to partition interindividual covariance between stages, which recapitulates the overall differentiation trajectory. Overall, CTMM is a powerful tool to characterize a novel dimension of cell type-specific biology in scRNA-seq.
单细胞RNA测序(scRNA-seq)技术的发展为以前所未有的分辨率描述细胞异质性提供了契机。尽管scRNA-seq已被广泛用于根据细胞类型和细胞状态的平均基因表达谱来识别和描述基因表达变异,但大多数研究忽略了个体供体之间的变异。对这种个体间变异进行建模可以提高检测细胞类型特异性生物学特征的统计功效,并为复杂性状的潜在基因和细胞类型提供信息。因此,我们开发了一种新模型来检测和量化个体间细胞类型特异性变异,称为CTMM(细胞类型特异性线性混合模型)。CTMM基于细胞类型特异性伪批量表达进行操作,并采用了可扩展到数百个样本的高效方法进行拟合。我们通过大量模拟表明,CTMM在实际情况下具有强大的功能且无偏差。我们还推导了针对细胞类型特异性个体间变异的校准检验,鉴于scRNA-seq数据中的样本量较小,这具有挑战性。我们将CTMM应用于来自人类诱导多能干细胞的scRNA-seq数据,以表征细胞分化为内胚层时供体间的转录组变异。我们发现,供体之间几乎100%的全转录组变异性是分化阶段特异性的。CTMM还识别出在样本间具有统计学显著阶段特异性变异性的单个基因,包括61个没有显著阶段特异性平均表达的基因。最后,我们扩展CTMM以划分不同阶段之间的个体间协方差,这概括了整体分化轨迹。总体而言,CTMM是表征scRNA-seq中细胞类型特异性生物学新维度的有力工具。