Suppr超能文献

使用贝叶斯网络评分和信息增益发现因果相互作用。

Discovering causal interactions using Bayesian network scoring and information gain.

作者信息

Zeng Zexian, Jiang Xia, Neapolitan Richard

机构信息

Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

BMC Bioinformatics. 2016 May 26;17(1):221. doi: 10.1186/s12859-016-1084-8.

Abstract

BACKGROUND

The problem of learning causal influences from data has recently attracted much attention. Standard statistical methods can have difficulty learning discrete causes, which interacting to affect a target, because the assumptions in these methods often do not model discrete causal relationships well. An important task then is to learn such interactions from data. Motivated by the problem of learning epistatic interactions from datasets developed in genome-wide association studies (GWAS), researchers conceived new methods for learning discrete interactions. However, many of these methods do not differentiate a model representing a true interaction from a model representing non-interacting causes with strong individual affects. The recent algorithm MBS-IGain addresses this difficulty by using Bayesian network learning and information gain to discover interactions from high-dimensional datasets. However, MBS-IGain requires marginal effects to detect interactions containing more than two causes. If the dataset is not high-dimensional, we can avoid this shortcoming by doing an exhaustive search.

RESULTS

We develop Exhaustive-IGain, which is like MBS-IGain but does an exhaustive search. We compare the performance of Exhaustive-IGain to MBS-IGain using low-dimensional simulated datasets based on interactions with marginal effects and ones based on interactions without marginal effects. Their performance is similar on the datasets based on marginal effects. However, Exhaustive-IGain compellingly outperforms MBS-IGain on the datasets based on 3 and 4-cause interactions without marginal effects. We apply Exhaustive-IGain to investigate how clinical variables interact to affect breast cancer survival, and obtain results that agree with judgements of a breast cancer oncologist.

CONCLUSIONS

We conclude that the combined use of information gain and Bayesian network scoring enables us to discover higher order interactions with no marginal effects if we perform an exhaustive search. We further conclude that Exhaustive-IGain can be effective when applied to real data.

摘要

背景

从数据中学习因果影响的问题近来备受关注。标准统计方法在学习离散原因时可能会遇到困难,这些离散原因相互作用以影响一个目标,因为这些方法中的假设通常不能很好地对离散因果关系进行建模。那么一项重要任务就是从数据中学习这种相互作用。受全基因组关联研究(GWAS)中开发的数据集学习上位相互作用问题的启发,研究人员构思了学习离散相互作用的新方法。然而,这些方法中的许多方法无法区分代表真实相互作用的模型和代表具有强烈个体影响的非相互作用原因的模型。最近的算法MBS - IGain通过使用贝叶斯网络学习和信息增益从高维数据集中发现相互作用来解决这一难题。然而,MBS - IGain需要边际效应来检测包含两个以上原因的相互作用。如果数据集不是高维的,我们可以通过进行穷举搜索来避免这一缺点。

结果

我们开发了Exhaustive - IGain,它类似于MBS - IGain,但进行穷举搜索。我们使用基于有边际效应的相互作用的低维模拟数据集以及基于无边际效应的相互作用的低维模拟数据集,将Exhaustive - IGain的性能与MBS - IGain进行比较。它们在基于边际效应的数据集上性能相似。然而,在基于无边际效应的三原因和四原因相互作用的数据集上,Exhaustive - IGain明显优于MBS - IGain。我们应用Exhaustive - IGain来研究临床变量如何相互作用以影响乳腺癌生存,并获得了与乳腺癌肿瘤学家判断一致的结果。

结论

我们得出结论,如果进行穷举搜索,信息增益和贝叶斯网络评分的联合使用使我们能够发现无边际效应的高阶相互作用。我们进一步得出结论,Exhaustive - IGain应用于实际数据时可能会有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01da/4880828/aa976ed134e5/12859_2016_1084_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验