The Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.
Systems Immunity Research Institute, Medical School, Cardiff University, Cardiff, CF14 4XN, UK.
Commun Biol. 2022 Apr 4;5(1):306. doi: 10.1038/s42003-022-03243-w.
Bayesian networks (BNs) are disciplined, explainable Artificial Intelligence models that can describe structured joint probability spaces. In the context of understanding complex relations between a number of variables in biological settings, they can be constructed from observed data and can provide a guiding, graphical tool in exploring such relations. Here we propose BNs for elucidating the relations between driver events in large cancer genomic datasets. We present a methodology that is specifically tailored to biologists and clinicians as they are the main producers of such datasets. We achieve this by using an optimal BN learning algorithm based on well established likelihood functions and by utilising just two tuning parameters, both of which are easy to set and have intuitive readings. To enhance value to clinicians, we introduce (a) the use of heatmaps for families in each network, and (b) visualising pairwise co-occurrence statistics on the network. For binary data, an optional step of fitting logic gates can be employed. We show how our methodology enhances pairwise testing and how biologists and clinicians can use BNs for discussing the main relations among driver events in large genomic cohorts. We demonstrate the utility of our methodology by applying it to 5 cancer datasets revealing complex genomic landscapes. Our networks identify central patterns in all datasets including a central 4-way mutual exclusivity between HDR, t(4,14), t(11,14) and t(14,16) in myeloma, and a 3-way mutual exclusivity of three major players: CALR, JAK2 and MPL, in myeloproliferative neoplasms. These analyses demonstrate that our methodology can play a central role in the study of large genomic cancer datasets.
贝叶斯网络(BNs)是一种有纪律、可解释的人工智能模型,可以描述结构化的联合概率空间。在理解生物环境中多个变量之间的复杂关系的背景下,它们可以从观察数据中构建出来,并为探索这些关系提供一个指导性的图形工具。在这里,我们提出了 BNs 来阐明大型癌症基因组数据集中的驱动事件之间的关系。我们提出了一种专门针对生物学家和临床医生的方法,因为他们是产生这些数据集的主要人员。我们通过使用基于成熟似然函数的最优 BN 学习算法并利用两个易于设置且具有直观读数的调整参数来实现这一目标。为了增强对临床医生的价值,我们引入了 (a) 每个网络中家族的热图表示,以及 (b) 网络上的成对共现统计的可视化。对于二进制数据,可以选择拟合逻辑门的步骤。我们展示了我们的方法如何增强成对测试,以及生物学家和临床医生如何使用 BNs 来讨论大型基因组队列中驱动事件之间的主要关系。我们通过将其应用于揭示复杂基因组景观的 5 个癌症数据集来证明我们方法的实用性。我们的网络在所有数据集(包括多发性骨髓瘤中 HDR、t(4,14)、t(11,14) 和 t(14,16) 之间的中央 4 向互斥性,以及骨髓增生性肿瘤中三个主要参与者 CALR、JAK2 和 MPL 之间的 3 向互斥性)中识别出核心模式。这些分析表明,我们的方法可以在大型基因组癌症数据集的研究中发挥核心作用。