Su Chang, Zhang Jingfei, Zhao Hongyu
Department of Biostatistics and Bioinformatics, Emory University.
Department of Biostatistics, Yale University.
J Am Stat Assoc. 2024;119(546):811-824. doi: 10.1080/01621459.2023.2297467. Epub 2024 Jan 31.
Inferring and characterizing gene co-expression networks has led to important insights on the molecular mechanisms of complex diseases. Most co-expression analyses to date have been performed on gene expression data collected from bulk tissues with different cell type compositions across samples. As a result, the co-expression estimates only offer an aggregated view of the underlying gene regulations and can be confounded by heterogeneity in cell type compositions, failing to reveal gene coordination that may be distinct across different cell types. In this paper, we introduce a flexible framework for estimating cell-type-specific gene co-expression networks from bulk sample data, without making specific assumptions on the distributions of gene expression profiles in different cell types. We develop a novel sparse least squares estimator, referred to as CSNet, that is efficient to implement and has good theoretical properties. Using CSNet, we analyzed the bulk gene expression data from a cohort study on Alzheimer's disease and identified previously unknown cell-type-specific co-expressions among Alzheimer's disease risk genes, suggesting cell-type-specific disease mechanisms.
推断和表征基因共表达网络已经为复杂疾病的分子机制带来了重要见解。迄今为止,大多数共表达分析都是基于从跨样本具有不同细胞类型组成的大块组织中收集的基因表达数据进行的。因此,共表达估计仅提供了潜在基因调控的汇总视图,并且可能会因细胞类型组成的异质性而混淆,无法揭示不同细胞类型中可能不同的基因协调情况。在本文中,我们引入了一个灵活的框架,用于从大块样本数据中估计细胞类型特异性基因共表达网络,而无需对不同细胞类型中基因表达谱的分布做出特定假设。我们开发了一种新颖的稀疏最小二乘估计器,称为CSNet,它易于实现且具有良好的理论性质。使用CSNet,我们分析了一项关于阿尔茨海默病的队列研究中的大块基因表达数据,并在阿尔茨海默病风险基因中识别出先前未知的细胞类型特异性共表达,这表明了细胞类型特异性疾病机制。