Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65201, USA.
Department of Information Science, University of North Texas, 3940 N Elm St, Denton, TX 76203, USA.
J Biomed Inform. 2024 Jun;154:104644. doi: 10.1016/j.jbi.2024.104644. Epub 2024 Apr 15.
OBJECTIVE: Gene expression analysis through single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of gene regulation in diverse cell types, tissues, and organisms. While existing methods primarily focus on identifying cell type-specific gene expression programs (GEPs), the characterization of GEPs associated with biological processes and stimuli responses remains limited. In this study, we aim to infer biologically meaningful GEPs that are associated with both cellular phenotypes and activity programs directly from scRNA-seq data. METHODS: We applied linear CorEx, a machine-learning-based approach, to infer GEPs by grouping genes based on total correlation optimization function in simulated and real-world scRNA-seq datasets. Additionally, we utilized a transfer learning approach to project CorEx-inferred GEPs to other scRNA-seq datasets. RESULTS: By leveraging total correlation optimization, linear CorEx groups genes and demonstrates superior performance in identifying cell types and activity programs compared to similar methods using simulated data. Furthermore, we apply this same approach to real-world scRNA-seq data from the mouse dentate gyrus and embryonic colon development, uncovering biologically relevant GEPs related to cell types, developmental ages, and cell cycle programs. We also demonstrate the potential for transfer learning by evaluating similar datasets, showcasing the cross-species sensitivity of linear CorEx. CONCLUSION: Our findings validate linear CorEx as a valuable tool for comprehensively analyzing complex signals in scRNA-seq data, leading to deeper insights into gene expression dynamics, cellular heterogeneity, and regulatory mechanisms.
目的:通过单细胞 RNA 测序(scRNA-seq)进行的基因表达分析彻底改变了我们对不同细胞类型、组织和生物体中基因调控的理解。虽然现有方法主要侧重于识别细胞类型特异性基因表达程序(GEP),但与生物过程和刺激反应相关的 GEP 特征仍然有限。在这项研究中,我们旨在直接从 scRNA-seq 数据推断与细胞表型和活性程序相关的生物学上有意义的 GEP。
方法:我们应用基于机器学习的线性 CorEx,通过基于总相关优化函数对模拟和真实 scRNA-seq 数据集的基因进行分组来推断 GEP。此外,我们还利用迁移学习方法将 CorEx 推断的 GEP 投射到其他 scRNA-seq 数据集。
结果:通过利用总相关优化,线性 CorEx 对基因进行分组,并在识别细胞类型和活性程序方面表现出优于使用模拟数据的类似方法的性能。此外,我们将相同的方法应用于来自小鼠齿状回和胚胎结肠发育的真实 scRNA-seq 数据,揭示了与细胞类型、发育年龄和细胞周期程序相关的生物学上相关的 GEP。我们还通过评估类似数据集展示了线性 CorEx 的跨物种敏感性,证明了迁移学习的潜力。
结论:我们的研究结果验证了线性 CorEx 作为一种全面分析 scRNA-seq 数据中复杂信号的有价值工具的价值,从而深入了解基因表达动力学、细胞异质性和调控机制。
Brief Bioinform. 2024-9-23
Brief Bioinform. 2024-9-23
BMC Bioinformatics. 2025-4-18
IEEE J Biomed Health Inform. 2025-4