IEEE/ACM Trans Comput Biol Bioinform. 2021 Jul-Aug;18(4):1325-1335. doi: 10.1109/TCBB.2019.2944826. Epub 2021 Aug 6.
Bayesian networks is a powerful method for identifying causal relationships among variables. However, as the network size increases, the time complexity of searching the optimal structure grows exponentially. We proposed a novel search algorithm - Fast and Furious Bayesian Network (FFBN). Compared to the existing greedy search algorithm, FFBN uses significantly fewer model configuration rules to determine the causal direction of edges when constructing the Bayesian network, which leads to greatly improved computational speed. We benchmarked the performance of FFBN by reconstructing gene regulatory networks (GRNs) from two DREAM5 challenge datasets: a synthetic dataset and a larger yeast transcriptome dataset. In both datasets, FFBN shows a much faster speed than the existing greedy search algorithm, while maintaining equally good or better performance in recall and precision. We then constructed three whole transcriptome GRNs for primary liver cancer (PL), primary colon cancer (PC) and colon to liver metastasis (CLM) expression data, which the existing greedy search algorithms failed. Three GRNs contain 12,099 common genes. Unprecedentedly, our newly developed FFBN algorithm is able to build up GRNs at a scale larger than 10,000 genes. Using FFBN, we discovered that CLM has its unique cancer molecular mechanisms and shares a certain degree of similarity with both PL and PC.
贝叶斯网络是一种强大的方法,可以识别变量之间的因果关系。然而,随着网络规模的增加,搜索最优结构的时间复杂度呈指数增长。我们提出了一种新的搜索算法 - 快速而激烈的贝叶斯网络 (FFBN)。与现有的贪婪搜索算法相比,FFBN 在构建贝叶斯网络时,使用的模型配置规则要少得多,用于确定边的因果方向,这导致计算速度大大提高。我们通过从两个 DREAM5 挑战数据集(一个合成数据集和一个更大的酵母转录组数据集)中重建基因调控网络 (GRN) 来评估 FFBN 的性能。在两个数据集上,FFBN 的速度都比现有的贪婪搜索算法快得多,同时在召回率和精度方面保持相同或更好的性能。然后,我们构建了三个用于原发性肝癌 (PL)、原发性结肠癌 (PC) 和结肠癌肝转移 (CLM) 表达数据的全转录组 GRN,而现有的贪婪搜索算法无法构建这些网络。三个 GRN 包含 12099 个共同基因。史无前例的是,我们新开发的 FFBN 算法能够构建规模超过 10000 个基因的 GRN。使用 FFBN,我们发现 CLM 具有独特的癌症分子机制,与 PL 和 PC 具有一定程度的相似性。