Howey Richard, Adam Jonathan, Adamski Jerzy, Atabaki Natalie N, Brunak Søren, Chmura Piotr Jaroslaw, De Masi Federico, Dermitzakis Emmanouil T, Fernandez-Tajes Juan J, Forgie Ian M, Franks Paul W, Giordano Giuseppe N, Haid Mark, Hansen Torben, Hansen Tue H, Harms Peter P, Hattersley Andrew T, Hong Mun-Gwan, Jacobsen Ulrik Plesner, Jones Angus G, Koivula Robert W, Kokkola Tarja, Mahajan Anubha, Mari Andrea, McCarthy Mark I, McDonald Timothy J, Musholt Petra B, Pavo Imre, Pearson Ewan R, Pedersen Oluf, Ruetten Hartmut, Rutters Femke, Schwenk Jochen M, Sharma Sapna, 't Hart Leen M, Vestergaard Henrik, Walker Mark, Viñuela Ana, Cordell Heather J
Research Software Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom.
Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
PLoS Genet. 2025 Jul 15;21(7):e1011776. doi: 10.1371/journal.pgen.1011776. eCollection 2025 Jul.
Here we report the results from exploratory analysis using a Bayesian network approach of data originally derived from a large North European study of type 2 diabetes (T2D) conducted by the IMI DIRECT consortium. 3029 individuals (795 with T2D and 2234 without) within 7 different study centres provided data comprising genotypes, proteins, metabolites, gene expression measurements and many different clinical variables. The main aim of the current study was to demonstrate the utility of our previously developed method to fit Bayesian networks by performing exploratory analysis of this dataset to identify possible causal relationships between these variables. The data was analysed using the BayesNetty software package, which can handle mixed discrete/continuous data with missing values. The original dataset consisted of over 16,000 variables, which were filtered down to 260 variables for analysis. Even with this reduction, no individual had complete data for all variables, making it impossible to analyse using standard Bayesian network methodology. However, using the recently proposed novel imputation method implemented in BayesNetty we computed a large average Bayesian network from which we could infer possible associations and causal relationships between variables of interest. Our results confirmed many previous findings in connection with T2D, including possible mediating proteins and genes, some of which have not been widely reported. We also confirmed potential causal relationships with liver fat that were identified in an earlier study that used the IMI DIRECT dataset but was limited to a smaller subset of individuals and variables (namely individuals with complete data at pre-defined variables of interest). In addition to providing valuable confirmation, our analyses thus demonstrate a proof-of-principle of the utility of the method implemented within BayesNetty. The full final average Bayesian network generated from our analysis is freely available and can be easily interrogated further to address specific focussed scientific questions of interest.
在此,我们报告了一项探索性分析的结果,该分析使用贝叶斯网络方法,对最初源自IMI DIRECT联盟在北欧开展的一项大型2型糖尿病(T2D)研究的数据进行了分析。7个不同研究中心的3029名个体(795名患有T2D,2234名未患)提供了包括基因型、蛋白质、代谢物、基因表达测量值以及许多不同临床变量的数据。本研究的主要目的是通过对该数据集进行探索性分析,以确定这些变量之间可能的因果关系,从而证明我们先前开发的用于拟合贝叶斯网络的方法的实用性。使用BayesNetty软件包对数据进行分析,该软件包可以处理具有缺失值的混合离散/连续数据。原始数据集包含超过16000个变量,经过筛选后减少到260个变量进行分析。即便进行了这样的精简,仍没有个体拥有所有变量的完整数据,这使得无法使用标准贝叶斯网络方法进行分析。然而,使用BayesNetty中最近提出的新型插补方法,我们计算出了一个大型平均贝叶斯网络,从中可以推断出感兴趣变量之间可能的关联和因果关系。我们的结果证实了许多先前与T2D相关的发现,包括可能的介导蛋白质和基因,其中一些尚未被广泛报道。我们还证实了与肝脏脂肪的潜在因果关系,这是在一项早期研究中确定的,该研究使用了IMI DIRECT数据集,但仅限于个体和变量的较小子集(即在所关注的预定义变量上具有完整数据的个体)。除了提供有价值的确认外,我们的分析还证明了BayesNetty中所实现方法的实用性的原理证明。我们分析生成的完整最终平均贝叶斯网络可免费获取,并且可以轻松进一步查询,以解决特定关注的科学问题。