Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA.
Computational and Statistical Genomics Branch, National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH), Baltimore, Maryland, USA.
Genet Epidemiol. 2022 Dec;46(8):615-628. doi: 10.1002/gepi.22494. Epub 2022 Jul 5.
To understand phenotypic variations and key factors which affect disease susceptibility of complex traits, it is important to decipher cell-type tissue compositions. To study cellular compositions of bulk tissue samples, one can evaluate cellular abundances and cell-type-specific gene expression patterns from the tissue transcriptome profiles. We develop both fixed and mixed models to reconstruct cellular expression fractions for bulk-profiled samples by using reference single-cell (sc) RNA-sequencing (RNA-seq) reference data. In benchmark evaluations of estimating cellular expression fractions, the mixed-effect models provide similar results as an elegant machine learning algorithm named cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORTx), which is a well-known and reliable procedure to reconstruct cell-type abundances and cell-type-specific gene expression profiles. In real data analysis, the mixed-effect models outperform or perform similarly as CIBERSORTx. The mixed models perform better than the fixed models in both benchmark evaluations and data analysis. In simulation studies, we show that if the heterogeneity exists in scRNA-seq data, it is better to use mixed models with heterogeneous mean and variance-covariance. As a byproduct, the mixed models provide fractions of covariance between subject-specific gene expression and cell types to measure their correlations. The proposed mixed models provide a complementary tool to dissect bulk tissues using scRNA-seq data.
为了理解影响复杂性状疾病易感性的表型变异和关键因素,解析细胞类型组织组成至关重要。为了研究批量组织样本的细胞组成,可以从组织转录组谱中评估细胞丰度和细胞类型特异性基因表达模式。我们开发了固定和混合模型,通过使用参考单细胞 (sc) RNA 测序 (RNA-seq) 参考数据,为批量分析样本重建细胞表达分数。在估计细胞表达分数的基准评估中,混合效应模型提供的结果与一种名为通过估计相对 RNA 转录本子集进行细胞类型识别的优雅机器学习算法 (CIBERSORTx) 相似,CIBERSORTx 是一种重建细胞类型丰度和细胞类型特异性基因表达模式的知名且可靠的方法。在实际数据分析中,混合效应模型的表现优于或与 CIBERSORTx 相当。在基准评估和数据分析中,混合模型均优于固定模型。在模拟研究中,我们表明如果 scRNA-seq 数据中存在异质性,最好使用具有异质均值和方差协方差的混合模型。作为副产品,混合模型提供了特定于个体的基因表达与细胞类型之间的协方差分数,以衡量它们的相关性。所提出的混合模型提供了一种使用 scRNA-seq 数据剖析批量组织的补充工具。