School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
BMC Genomics. 2017 Nov 17;18(Suppl 9):844. doi: 10.1186/s12864-017-4228-y.
The reconstruction of gene regulatory network (GRN) from gene expression data can discover regulatory relationships among genes and gain deep insights into the complicated regulation mechanism of life. However, it is still a great challenge in systems biology and bioinformatics. During the past years, numerous computational approaches have been developed for this goal, and Bayesian network (BN) methods draw most of attention among these methods because of its inherent probability characteristics. However, Bayesian network methods are time consuming and cannot handle large-scale networks due to their high computational complexity, while the mutual information-based methods are highly effective but directionless and have a high false-positive rate.
To solve these problems, we propose a Candidate Auto Selection algorithm (CAS) based on mutual information and breakpoint detection to restrict the search space in order to accelerate the learning process of Bayesian network. First, the proposed CAS algorithm automatically selects the neighbor candidates of each node before searching the best structure of GRN. Then based on CAS algorithm, we propose a globally optimal greedy search method (CAS + G), which focuses on finding the highest rated network structure, and a local learning method (CAS + L), which focuses on faster learning the structure with little loss of quality.
Results show that the proposed CAS algorithm can effectively reduce the search space of Bayesian networks through identifying the neighbor candidates of each node. In our experiments, the CAS + G method outperforms the state-of-the-art method on simulation data for inferring GRNs, and the CAS + L method is significantly faster than the state-of-the-art method with little loss of accuracy. Hence, the CAS based methods effectively decrease the computational complexity of Bayesian network and are more suitable for GRN inference.
从基因表达数据中重建基因调控网络(GRN)可以发现基因之间的调控关系,并深入了解生命的复杂调控机制。然而,这在系统生物学和生物信息学中仍然是一个巨大的挑战。在过去的几年中,已经开发出许多计算方法来实现这一目标,而贝叶斯网络(BN)方法由于其固有的概率特性而成为这些方法中最受关注的方法。然而,由于其计算复杂度高,贝叶斯网络方法耗时且无法处理大规模网络,而基于互信息的方法则非常有效但无方向且假阳性率高。
为了解决这些问题,我们提出了一种基于互信息和断点检测的候选自动选择算法(CAS)来限制搜索空间,以加速贝叶斯网络的学习过程。首先,所提出的 CAS 算法在搜索 GRN 的最佳结构之前自动选择每个节点的邻居候选。然后,基于 CAS 算法,我们提出了一种全局最优贪婪搜索方法(CAS+G),该方法专注于找到最高评分的网络结构,以及一种局部学习方法(CAS+L),该方法专注于更快地学习结构而不会降低质量。
结果表明,所提出的 CAS 算法可以通过识别每个节点的邻居候选来有效减少贝叶斯网络的搜索空间。在我们的实验中,CAS+G 方法在模拟数据上优于用于推断 GRN 的最新方法,而 CAS+L 方法的速度明显快于最新方法,且准确性损失很小。因此,基于 CAS 的方法有效地降低了贝叶斯网络的计算复杂度,更适合 GRN 推断。