Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China.
Bioinformatics. 2022 Sep 30;38(19):4522-4529. doi: 10.1093/bioinformatics/btac559.
Single-cell RNA sequencing (scRNA-seq) data provides unprecedented opportunities to reconstruct gene regulatory networks (GRNs) at fine-grained resolution. Numerous unsupervised or self-supervised models have been proposed to infer GRN from bulk RNA-seq data, but few of them are appropriate for scRNA-seq data under the circumstance of low signal-to-noise ratio and dropout. Fortunately, the surging of TF-DNA binding data (e.g. ChIP-seq) makes supervised GRN inference possible. We regard supervised GRN inference as a graph-based link prediction problem that expects to learn gene low-dimensional vectorized representations to predict potential regulatory interactions.
In this paper, we present GENELink to infer latent interactions between transcription factors (TFs) and target genes in GRN using graph attention network. GENELink projects the single-cell gene expression with observed TF-gene pairs to a low-dimensional space. Then, the specific gene representations are learned to serve for downstream similarity measurement or causal inference of pairwise genes by optimizing the embedding space. Compared to eight existing GRN reconstruction methods, GENELink achieves comparable or better performance on seven scRNA-seq datasets with four types of ground-truth networks. We further apply GENELink on scRNA-seq of human breast cancer metastasis and reveal regulatory heterogeneity of Notch and Wnt signalling pathways between primary tumour and lung metastasis. Moreover, the ontology enrichment results of unique lung metastasis GRN indicate that mitochondrial oxidative phosphorylation (OXPHOS) is functionally important during the seeding step of the cancer metastatic cascade, which is validated by pharmacological assays.
The code and data are available at https://github.com/zpliulab/GENELink.
Supplementary data are available at Bioinformatics online.
单细胞 RNA 测序 (scRNA-seq) 数据提供了前所未有的机会,可以以细粒度的分辨率重建基因调控网络 (GRN)。已经提出了许多无监督或自监督的模型来从批量 RNA-seq 数据中推断 GRN,但它们很少适用于低信噪比和丢包情况下的 scRNA-seq 数据。幸运的是,TF-DNA 结合数据(例如 ChIP-seq)的激增使得监督 GRN 推断成为可能。我们将监督 GRN 推断视为基于图的链接预测问题,期望学习基因的低维向量化表示,以预测潜在的调控相互作用。
在本文中,我们提出了 GENELink,使用图注意网络来推断 GRN 中转录因子 (TF) 和靶基因之间的潜在相互作用。GENELink 将具有观察到的 TF-基因对的单细胞基因表达投影到低维空间。然后,通过优化嵌入空间,学习特定的基因表示,以便对成对基因进行下游相似性测量或因果推断。与八种现有的 GRN 重建方法相比,GENELink 在七种具有四种类型真实网络的 scRNA-seq 数据集上实现了可比或更好的性能。我们进一步将 GENELink 应用于人类乳腺癌转移的 scRNA-seq,并揭示了原发性肿瘤和肺转移之间 Notch 和 Wnt 信号通路的调控异质性。此外,独特的肺转移 GRN 的本体论富集结果表明,在线粒体氧化磷酸化 (OXPHOS) 在癌症转移级联的播种步骤中具有重要的功能,这通过药理学实验得到了验证。
代码和数据可在 https://github.com/zpliulab/GENELink 上获得。
补充数据可在 Bioinformatics 在线获得。