Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland.
SIB Swiss Institute of Bioinformatics, Switzerland.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac219.
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database, each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and normal cells may be used for the discovery of targeted therapies.
动态贝叶斯网络 (DBN) 可用于从时间序列基因表达数据中发现基因调控网络 (GRN)。在这里,我们建议通过采用一种可扩展到大型网络的贝叶斯方法来从基因表达数据中学习 DBN,该方法旨在学习具有高精度预测能力的模型。我们的框架可用于对多组样本进行 DBN 学习,并突出其 GRN 的差异和相似之处。我们基于不同的结构和参数假设学习这些 DBN 模型,并根据交叉验证的预测准确性选择最佳模型。我们在模拟研究中表明,与之前研究中使用的技术相比,我们的方法更好地防止了过拟合。我们将基于 DBN 的方法应用于来自基因表达综合数据库的两个时间序列转录组数据集,每个数据集都包含相同组织类型的不同表型组的数据。在第一种情况下,我们使用 DBN 来描述对癌症治疗的反应者和非反应者。在第二种情况下,我们将正常细胞与结直肠组织的肿瘤细胞进行了比较。基于 DBN 的分类器在两个数据集上的分类准确性均高于之前的报道。对于结直肠癌数据集,我们的分析表明,癌症和正常组织的 GRN 存在很多差异,这些差异在癌基因和已知的癌症组织标记物的附近最为明显。在癌症和正常细胞的基因网络中发现的差异可能被用于发现靶向治疗方法。