Tang Chenwei, Sun Quan, Zeng Xinyue, Yang Xiaoyu, Liu Fei, Zhao Jinying, Shen Yin, Liu Bixiang, Wen Jia, Li Yun
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
bioRxiv. 2024 May 24:2024.05.23.595514. doi: 10.1101/2024.05.23.595514.
Cell type specific (CTS) analysis is essential to reveal biological insights obscured in bulk tissue data. However, single-cell (sc) or single-nuclei (sn) resolution data are still cost-prohibitive for large-scale samples. Thus, computational methods to perform deconvolution from bulk tissue data are highly valuable. We here present EPIC-unmix, a novel two-step empirical Bayesian method integrating reference sc/sn RNA-seq data and bulk RNA-seq data from target samples to enhance the accuracy of CTS inference. We demonstrate through comprehensive simulations across three tissues that EPIC-unmix achieved 4.6% - 109.8% higher accuracy compared to alternative methods. By applying EPIC-unmix to human bulk brain RNA-seq data from the ROSMAP and MSBB cohorts, we identified multiple genes differentially expressed between Alzheimer's disease (AD) cases versus controls in a CTS manner, including 57.4% novel genes not identified using similar sample size sc/snRNA-seq data, indicating the power of our approach. Among the 6-69% overlapping, 83%-100% are in consistent direction with those from sc/snRNA-seq data, supporting the reliability of our findings. EPIC-unmix inferred CTS expression profiles similarly empowers CTS eQTL analysis. Among the novel eQTLs, we highlight a microglia eQTL for AD risk gene obscured in bulk and missed by sc/snRNA-seq based eQTL analysis. The variant resides in a microglia-specific cCRE, forming chromatin loop with promoter region in microglia. Taken together, we believe EPIC-unmix will be a valuable tool to enable more powerful CTS analysis.
细胞类型特异性(CTS)分析对于揭示整体组织数据中隐藏的生物学见解至关重要。然而,对于大规模样本而言,单细胞(sc)或单细胞核(sn)分辨率的数据成本仍然过高。因此,从整体组织数据进行反卷积的计算方法具有很高的价值。我们在此介绍EPIC-unmix,这是一种新颖的两步经验贝叶斯方法,它整合了来自目标样本的参考sc/sn RNA测序数据和整体RNA测序数据,以提高CTS推断的准确性。我们通过对三种组织进行全面模拟证明,与其他方法相比,EPIC-unmix的准确率提高了4.6% - 109.8%。通过将EPIC-unmix应用于来自ROSMAP和MSBB队列的人类大脑整体RNA测序数据,我们以CTS方式鉴定出阿尔茨海默病(AD)病例与对照之间差异表达的多个基因,其中包括57.4%使用类似样本量的sc/snRNA测序数据未鉴定出的新基因,这表明了我们方法的强大功能。在6% - 69%重叠的基因中,83% - 100%与sc/snRNA测序数据的方向一致,支持了我们研究结果的可靠性。EPIC-unmix推断的CTS表达谱同样增强了CTS eQTL分析。在新的eQTL中,我们突出了一个在整体数据中被掩盖且基于sc/snRNA测序的eQTL分析遗漏的AD风险基因的小胶质细胞eQTL。该变异位于小胶质细胞特异性的cCRE中,与小胶质细胞中的启动子区域形成染色质环。综上所述,我们相信EPIC-unmix将成为进行更强大的CTS分析的有价值工具。