基于联合单细胞基础模型和基于图的学习预测基因调控连接

Prediction of Gene Regulatory Connections with Joint Single-Cell Foundation Models and Graph-Based Learning.

作者信息

Kommu Sindhura, Wang Yizhi, Wang Yue, Wang Xuan

机构信息

Department of Computer Science, Virginia Tech, Blacksburg, 24061, Virginia, USA.

Department of Electrical and Computer Engineering, Virginia Tech, Arlington, 22203, Virginia, USA.

出版信息

bioRxiv. 2025 Jan 29:2024.12.16.628715. doi: 10.1101/2024.12.16.628715.

DOI:10.1101/2024.12.16.628715

PMID:39975293

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11838224/

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) data offers unprecedented opportunities to infer gene regulatory networks (GRNs) at a fine-grained resolution, shedding light on cellular phenotypes at the molecular level. However, the high sparsity, noise, and dropout events inherent in scRNA-seq data pose significant challenges for accurate and reliable GRN inference. The rapid growth in experimentally validated transcription factor-DNA binding data (e.g., ChIP-seq) has enabled supervised machine learning methods, which rely on known gene regulatory interactions to learn patterns, and achieve high accuracy in GRN inference by framing it as a gene regulatory link prediction task. This study addresses the gene regulatory link prediction problem by learning informative vectorized representations at the gene level to predict missing regulatory interactions. However, a higher performance of supervised learning methods requires a large amount of known TF-DNA binding data, which is often experimentally expensive and therefore limited in amount. Advances in large-scale pre-training and transfer learning provide a transformative opportunity to address this challenge. In this study, we leverage large-scale pre-trained models, trained on extensive scRNA-seq datasets and known as single-cell foundation models (scFMs). These models are combined with joint graph-based learning to establish a robust foundation for gene regulatory link prediction.

RESULTS

We propose scRegNet, a novel and effective framework that leverages scFMs with joint graph-based learning for gene regulatory link prediction. scRegNet achieves state-of-the-art results in comparison with nine baseline methods on seven scRNA-seq benchmark datasets. In addition, scRegNet is more robust than the baseline methods on noisy training data.

AVAILABILITY

The source code is available at https://github.com/sindhura-cs/scRegNet.

摘要

动机

单细胞RNA测序（scRNA-seq）数据为在细粒度分辨率下推断基因调控网络（GRN）提供了前所未有的机会，从而在分子水平上揭示细胞表型。然而，scRNA-seq数据中固有的高稀疏性、噪声和缺失值事件对准确可靠的GRN推断提出了重大挑战。实验验证的转录因子-DNA结合数据（如ChIP-seq）的快速增长使得监督机器学习方法得以实现，这些方法依赖已知的基因调控相互作用来学习模式，并通过将其构建为基因调控链接预测任务在GRN推断中实现高精度。本研究通过在基因水平学习信息性向量表示来预测缺失的调控相互作用，解决基因调控链接预测问题。然而，监督学习方法的更高性能需要大量已知的TF-DNA结合数据，而这些数据通常在实验上成本高昂，因此数量有限。大规模预训练和迁移学习的进展为应对这一挑战提供了变革性机会。在本研究中，我们利用在广泛的scRNA-seq数据集上训练的大规模预训练模型，即单细胞基础模型（scFM）。这些模型与基于联合图的学习相结合，为基因调控链接预测建立了一个强大的基础。

结果

我们提出了scRegNet，这是一个新颖且有效的框架，它利用scFM与基于联合图的学习进行基因调控链接预测。与七种scRNA-seq基准数据集上的九种基线方法相比，scRegNet取得了领先的结果。此外，在有噪声的训练数据上，scRegNet比基线方法更稳健。

可用性

源代码可在https://github.com/sindhura-cs/scRegNet获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2dd/11838224/49e67b6eb22c/nihpp-2024.12.16.628715v2-f0001.jpg

相似文献

Prediction of Gene Regulatory Connections with Joint Single-Cell Foundation Models and Graph-Based Learning.基于联合单细胞基础模型和基于图的学习预测基因调控连接

bioRxiv. 2025 Jan 29:2024.12.16.628715. doi: 10.1101/2024.12.16.628715.

Prediction of gene regulatory connections with joint single-cell foundation models and graph-based learning.基于联合单细胞基础模型和图学习预测基因调控连接

Bioinformatics. 2025 Jul 1;41(Supplement_1):i619-i627. doi: 10.1093/bioinformatics/btaf217.

stGNN: Spatially Informed Cell-Type Deconvolution Based on Deep Graph Learning and Statistical Modeling.stGNN：基于深度图学习和统计建模的空间信息细胞类型反卷积

Interdiscip Sci. 2025 Jun 26. doi: 10.1007/s12539-025-00728-0.

Differentiable graph clustering with structural grouping for single-cell RNA-seq data.用于单细胞RNA测序数据的具有结构分组的可微图聚类

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf347.

scRegulate: Single-Cell Regulatory-Embedded Variational Inference of Transcription Factor Activity from Gene Expression.scRegulate：基于基因表达的转录因子活性的单细胞调控嵌入变分推理

bioRxiv. 2025 May 5:2025.04.17.649372. doi: 10.1101/2025.04.17.649372.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

scGANSL: Graph Attention Network with Subspace Learning for scRNA-seq Data Clustering.scGANSL：用于scRNA-seq数据聚类的带子空间学习的图注意力网络

J Chem Inf Model. 2025 Jun 23;65(12):6367-6381. doi: 10.1021/acs.jcim.5c00731. Epub 2025 Jun 5.

Short-Term Memory Impairment短期记忆障碍

BiGSM: Bayesian inference of gene regulatory network via sparse modelling.BiGSM：通过稀疏建模进行基因调控网络的贝叶斯推断

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf318.

本文引用的文献

Large-scale foundation model on single-cell transcriptomics.单细胞转录组学的大规模基础模型。

Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.

scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT：迈向使用生成式人工智能构建单细胞多组学基础模型

Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.

Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks.利用图神经网络从单细胞 RNA-seq 数据中预测基因调控关系。

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad414.

Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。

Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.

Boosting single-cell gene regulatory network reconstruction via bulk-cell transcriptomic data.通过 bulk-cell 转录组数据提升单细胞基因调控网络重构。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac389.

Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data.基于单细胞 RNA 测序数据的基因调控链路预测的图注意力网络。

Bioinformatics. 2022 Sep 30;38(19):4522-4529. doi: 10.1093/bioinformatics/btac559.

Single-cell RNA sequencing technologies and applications: A brief overview.单细胞 RNA 测序技术及应用：简述。

Clin Transl Med. 2022 Mar;12(3):e694. doi: 10.1002/ctm2.694.

DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-seq data.DeepDRIM：一种基于深度神经网络的方法，可使用单细胞 RNA-seq 数据重建细胞类型特异性基因调控网络。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab325.

Multimodal Transformer for Unaligned Multimodal Language Sequences.用于未对齐多模态语言序列的多模态变换器

Proc Conf Assoc Comput Linguist Meet. 2019 Jul;2019:6558-6569. doi: 10.18653/v1/p19-1656.

Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data.基于单细胞转录组数据的基因调控网络推断算法的基准测试。

Nat Methods. 2020 Feb;17(2):147-154. doi: 10.1038/s41592-019-0690-6. Epub 2020 Jan 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于联合单细胞基础模型和基于图的学习预测基因调控连接

Prediction of Gene Regulatory Connections with Joint Single-Cell Foundation Models and Graph-Based Learning.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献