Qin Hang, Garbulowski Mateusz, Sonnhammer Erik L L, Chatterjee Saikat
Digital Futures, and School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm 11428, Sweden.
Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna 17121, Sweden.
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf318.
Inference of gene regulatory network (GRN) is challenging due to the inherent sparsity of the GRN matrix and noisy expression data, often leading to a high possibility of false positive or negative predictions. To address this, it is essential to leverage the sparsity of the GRN matrix and develop a robust method capable of handling varying levels of noise in the data. Moreover, most existing GRN inference methods produce only fixed point estimates, which lack the flexibility and informativeness for comprehensive network analysis. In contrast, a Bayesian approach that yields closed-form posterior distributions allows probabilistic link selection, offering insights into the statistical confidence of each possible link. Consequently, it is important to engineer a Bayesian GRN inference method and rigorously execute a benchmark evaluation compared to state-of-the-art methods.
We propose a method-Bayesian inference of GRN via Sparse Modelling (BiGSM). BiGSM effectively exploits the sparsity of the GRN matrix and infers the posterior distributions of GRN links from noisy expression data by using the maximum likelihood based learning. We thoroughly benchmarked BiGSM using biological and simulated datasets including GeneNetWeaver, GeneSPIDER, and GRNbenchmark. The benchmark test evaluates its accuracy and robustness across varying noise levels and data models. Using point-estimate based performance measures, BiGSM provides an overall best performance in comparison with several state-of-the-art methods including GENIE3, LASSO, LSCON, and Zscore. Additionally, BiGSM is the only method in the set of competing methods that provides posteriors for the GRN weights, helping to decipher confidence across predictions.
Code implemented via MATLAB and Python are available at Github: https://github.com/SachLab/BiGSM and archived at zenodo.
基因调控网络(GRN)的推断具有挑战性,因为GRN矩阵固有的稀疏性以及有噪声的表达数据,这常常导致假阳性或假阴性预测的可能性很高。为了解决这个问题,利用GRN矩阵的稀疏性并开发一种能够处理数据中不同噪声水平的稳健方法至关重要。此外,大多数现有的GRN推断方法仅产生固定点估计,这对于全面的网络分析缺乏灵活性和信息性。相比之下,产生封闭形式后验分布的贝叶斯方法允许进行概率性链接选择,从而深入了解每个可能链接的统计置信度。因此,设计一种贝叶斯GRN推断方法并与最先进的方法进行严格的基准评估很重要。
我们提出了一种方法——通过稀疏建模的GRN贝叶斯推断(BiGSM)。BiGSM有效地利用了GRN矩阵的稀疏性,并通过基于最大似然的学习从有噪声的表达数据中推断GRN链接的后验分布。我们使用包括GeneNetWeaver、GeneSPIDER和GRNbenchmark在内的生物和模拟数据集对BiGSM进行了全面的基准测试。该基准测试评估了其在不同噪声水平和数据模型下的准确性和稳健性。使用基于点估计的性能指标,与包括GENIE3、LASSO、LSCON和Zscore在内的几种最先进方法相比,BiGSM提供了总体最佳性能。此外,BiGSM是竞争方法集中唯一一种提供GRN权重后验的方法,有助于解读预测的置信度。
通过MATLAB和Python实现的代码可在Github上获取:https://github.com/SachLab/BiGSM,并已存档于zenodo。