Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad370.
Gene regulatory networks (GRNs) drive organism structure and functions, so the discovery and characterization of GRNs is a major goal in biological research. However, accurate identification of causal regulatory connections and inference of GRNs using gene expression datasets, more recently from single-cell RNA-seq (scRNA-seq), has been challenging. Here we employ the innovative method of Causal Inference Using Composition of Transactions (CICT) to uncover GRNs from scRNA-seq data. The basis of CICT is that if all gene expressions were random, a non-random regulatory gene should induce its targets at levels different from the background random process, resulting in distinct patterns in the whole relevance network of gene-gene associations. CICT proposes novel network features derived from a relevance network, which enable any machine learning algorithm to predict causal regulatory edges and infer GRNs. We evaluated CICT using simulated and experimental scRNA-seq data in a well-established benchmarking pipeline and showed that CICT outperformed existing network inference methods representing diverse approaches with many-fold higher accuracy. Furthermore, we demonstrated that GRN inference with CICT was robust to different levels of sparsity in scRNA-seq data, the characteristics of data and ground truth, the choice of association measure and the complexity of the supervised machine learning algorithm. Our results suggest aiming at directly predicting causality to recover regulatory relationships in complex biological networks substantially improves accuracy in GRN inference.
基因调控网络(GRNs)驱动着生物体的结构和功能,因此发现和描述 GRNs 是生物学研究的主要目标。然而,使用基因表达数据集(最近也来自单细胞 RNA-seq(scRNA-seq))准确识别因果调节关系并推断 GRNs 一直具有挑战性。在这里,我们采用了一种创新的方法——基于事务组合的因果推断(Causal Inference Using Composition of Transactions,CICT),从 scRNA-seq 数据中揭示 GRNs。CICT 的基础是,如果所有基因表达都是随机的,那么一个非随机的调节基因应该以不同于背景随机过程的水平诱导其靶基因,从而导致基因-基因关联的整个相关性网络中出现独特的模式。CICT 提出了一种新颖的网络特征,这些特征源自相关性网络,使任何机器学习算法都能够预测因果调节边缘并推断 GRNs。我们使用经过充分验证的基准测试管道中的模拟和实验 scRNA-seq 数据评估了 CICT,并表明 CICT 优于代表各种方法的现有网络推断方法,其准确性要高出许多倍。此外,我们还证明了 CICT 进行 GRN 推断具有鲁棒性,可以抵抗 scRNA-seq 数据中不同程度的稀疏性、数据和真实情况的特征、关联度量的选择以及监督机器学习算法的复杂性。我们的结果表明,旨在直接预测因果关系以恢复复杂生物网络中的调节关系,可大大提高 GRN 推断的准确性。