Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
Bioinformatics. 2019 Apr 1;35(7):1204-1212. doi: 10.1093/bioinformatics/bty769.
Integration of data from different modalities is a necessary step for multi-scale data analysis in many fields, including biomedical research and systems biology. Directed graphical models offer an attractive tool for this problem because they can represent both the complex, multivariate probability distributions and the causal pathways influencing the system. Graphical models learned from biomedical data can be used for classification, biomarker selection and functional analysis, while revealing the underlying network structure and thus allowing for arbitrary likelihood queries over the data.
In this paper, we present and test new methods for finding directed graphs over mixed data types (continuous and discrete variables). We used this new algorithm, CausalMGM, to identify variables directly linked to disease diagnosis and progression in various multi-modal datasets, including clinical datasets from chronic obstructive pulmonary disease (COPD). COPD is the third leading cause of death and a major cause of disability and thus determining the factors that cause longitudinal lung function decline is very important. Applied on a COPD dataset, mixed graphical models were able to confirm and extend previously described causal effects and provide new insights on the factors that potentially affect the longitudinal lung function decline of COPD patients.
The CausalMGM package is available on http://www.causalmgm.org.
Supplementary data are available at Bioinformatics online.
在包括生物医学研究和系统生物学在内的许多领域中,整合来自不同模态的数据是进行多尺度数据分析的必要步骤。有向图形模型为这个问题提供了一个有吸引力的工具,因为它们可以表示复杂的多元概率分布以及影响系统的因果途径。从生物医学数据中学习到的图形模型可用于分类、生物标志物选择和功能分析,同时揭示潜在的网络结构,从而允许对数据进行任意似然查询。
在本文中,我们提出并测试了用于在混合数据类型(连续和离散变量)上寻找有向图的新方法。我们使用这个新算法 CausalMGM 来识别各种多模态数据集中与疾病诊断和进展直接相关的变量,包括慢性阻塞性肺疾病(COPD)的临床数据集。COPD 是第三大致死原因,也是主要的残疾原因,因此确定导致纵向肺功能下降的因素非常重要。应用于 COPD 数据集,混合图形模型能够证实和扩展先前描述的因果效应,并提供有关可能影响 COPD 患者纵向肺功能下降的因素的新见解。
CausalMGM 包可在 http://www.causalmgm.org 上获得。
补充数据可在 Bioinformatics 在线获得。