Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China.
Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad414.
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.
单细胞 RNA 测序 (scRNA-seq) 已成为研究单细胞水平基因表达模式的强大技术。从 scRNA-seq 数据中推断基因调控网络 (GRN) 可以从基因组水平深入了解细胞表型。然而,scRNA-seq 数据固有的高稀疏性、噪声和缺失事件给 GRN 推断带来了挑战。近年来,实验验证的转录因子与 DNA 结合的相关数据呈爆炸式增长,使得通过监督方法推断 GRN 成为可能。在本研究中,我们通过将其构造成图链路预测任务来解决 GRN 推断问题。在本文中,我们提出了一个名为 GNNLink 的新框架,该框架利用已知的 GRN 来推断基因之间潜在的调控相互依赖关系。首先,我们对原始的 scRNA-seq 数据进行预处理。然后,我们引入了一个基于图卷积网络的交互图编码器,通过捕获网络中节点之间的相互依赖关系,有效地细化基因特征。最后,通过对节点特征进行矩阵补全操作来获得 GRN 的推断。从模型训练中获得的特征可应用于下游任务,如测量基因对之间的相似性和推断因果关系。为了评估 GNNLink 的性能,我们使用七种 scRNA-seq 数据集将其与六种现有的 GRN 重建方法进行了比较。这些数据集涵盖了不同的真实网络,包括功能互作网络、基因敲除/过表达数据、非特异性 ChIP-seq 数据和细胞类型特异性 ChIP-seq 数据。我们的实验结果表明,GNNLink 在这些数据集上的表现相当或优于其他方法,展示了其稳健性和准确性。此外,我们观察到在不同规模的数据集上都具有一致的性能。为了重现性,我们在 GitHub 存储库:https://github.com/sdesignates/GNNLink 上提供了 GNNLink 的数据和源代码。