IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):399-409. doi: 10.1109/TCBB.2022.3144418. Epub 2023 Feb 3.
The identification of gene regulatory networks (GRN) from gene expression time series data is a challenge and open problem in system biology. This paper considers the structure inference of GRN from the incomplete and noisy gene expression data, which is a not well-studied issue for GRN inference. In this paper, the dynamical behavior of the gene expression process is described by a stochastic nonlinear state-space model with unknown noise information. A variational Bayesian (VB) framework are proposed to estimate the parameters and gene expression levels simultaneously. One of the advantages of this method is that it can easily handle the missing observations by generating the prediction values. Considering the sparsity of GRN, the smoothed gene data are modeled by the extreme gradient boosting tree, and the regulatory interactions among genes are identified by the importance scores based on the tree model. The proposed method is tested on the artificial DREAM4 datasets and one real gene expression dataset of yeast. The comparative results show that the proposed method can effectively recover the regulatory interactions of GRN in the presence of missing observations and outperforms the existing methods for GRN identification.
从基因表达时间序列数据中识别基因调控网络(GRN)是系统生物学中的一个挑战和开放性问题。本文考虑了从不完整和嘈杂的基因表达数据中推断 GRN 的结构,这是 GRN 推断中一个尚未得到充分研究的问题。在本文中,基因表达过程的动态行为由具有未知噪声信息的随机非线性状态空间模型来描述。提出了一种变分贝叶斯(VB)框架来同时估计参数和基因表达水平。该方法的一个优点是可以通过生成预测值轻松处理缺失的观测值。考虑到 GRN 的稀疏性,通过极端梯度提升树对平滑后的基因数据进行建模,并根据树模型的重要性得分来识别基因之间的调控关系。该方法在人工 DREAM4 数据集和一个酵母的真实基因表达数据集上进行了测试。比较结果表明,在存在缺失观测值的情况下,所提出的方法可以有效地恢复 GRN 的调控关系,并且在 GRN 识别方面优于现有方法。