使用贝叶斯网络评分和信息增益发现因果相互作用。

Discovering causal interactions using Bayesian network scoring and information gain.

作者信息

Zeng Zexian, Jiang Xia, Neapolitan Richard

机构信息

Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

BMC Bioinformatics. 2016 May 26;17(1):221. doi: 10.1186/s12859-016-1084-8.

DOI:10.1186/s12859-016-1084-8

PMID:27230078

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4880828/

Abstract

BACKGROUND

The problem of learning causal influences from data has recently attracted much attention. Standard statistical methods can have difficulty learning discrete causes, which interacting to affect a target, because the assumptions in these methods often do not model discrete causal relationships well. An important task then is to learn such interactions from data. Motivated by the problem of learning epistatic interactions from datasets developed in genome-wide association studies (GWAS), researchers conceived new methods for learning discrete interactions. However, many of these methods do not differentiate a model representing a true interaction from a model representing non-interacting causes with strong individual affects. The recent algorithm MBS-IGain addresses this difficulty by using Bayesian network learning and information gain to discover interactions from high-dimensional datasets. However, MBS-IGain requires marginal effects to detect interactions containing more than two causes. If the dataset is not high-dimensional, we can avoid this shortcoming by doing an exhaustive search.

RESULTS

We develop Exhaustive-IGain, which is like MBS-IGain but does an exhaustive search. We compare the performance of Exhaustive-IGain to MBS-IGain using low-dimensional simulated datasets based on interactions with marginal effects and ones based on interactions without marginal effects. Their performance is similar on the datasets based on marginal effects. However, Exhaustive-IGain compellingly outperforms MBS-IGain on the datasets based on 3 and 4-cause interactions without marginal effects. We apply Exhaustive-IGain to investigate how clinical variables interact to affect breast cancer survival, and obtain results that agree with judgements of a breast cancer oncologist.

CONCLUSIONS

We conclude that the combined use of information gain and Bayesian network scoring enables us to discover higher order interactions with no marginal effects if we perform an exhaustive search. We further conclude that Exhaustive-IGain can be effective when applied to real data.

摘要

背景

从数据中学习因果影响的问题近来备受关注。标准统计方法在学习离散原因时可能会遇到困难，这些离散原因相互作用以影响一个目标，因为这些方法中的假设通常不能很好地对离散因果关系进行建模。那么一项重要任务就是从数据中学习这种相互作用。受全基因组关联研究（GWAS）中开发的数据集学习上位相互作用问题的启发，研究人员构思了学习离散相互作用的新方法。然而，这些方法中的许多方法无法区分代表真实相互作用的模型和代表具有强烈个体影响的非相互作用原因的模型。最近的算法MBS - IGain通过使用贝叶斯网络学习和信息增益从高维数据集中发现相互作用来解决这一难题。然而，MBS - IGain需要边际效应来检测包含两个以上原因的相互作用。如果数据集不是高维的，我们可以通过进行穷举搜索来避免这一缺点。

结果

我们开发了Exhaustive - IGain，它类似于MBS - IGain，但进行穷举搜索。我们使用基于有边际效应的相互作用的低维模拟数据集以及基于无边际效应的相互作用的低维模拟数据集，将Exhaustive - IGain的性能与MBS - IGain进行比较。它们在基于边际效应的数据集上性能相似。然而，在基于无边际效应的三原因和四原因相互作用的数据集上，Exhaustive - IGain明显优于MBS - IGain。我们应用Exhaustive - IGain来研究临床变量如何相互作用以影响乳腺癌生存，并获得了与乳腺癌肿瘤学家判断一致的结果。

结论

我们得出结论，如果进行穷举搜索，信息增益和贝叶斯网络评分的联合使用使我们能够发现无边际效应的高阶相互作用。我们进一步得出结论，Exhaustive - IGain应用于实际数据时可能会有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01da/4880828/aa976ed134e5/12859_2016_1084_Fig1_HTML.jpg

相似文献

Discovering causal interactions using Bayesian network scoring and information gain.

BMC Bioinformatics. 2016 May 26;17(1):221. doi: 10.1186/s12859-016-1084-8.

Learning Predictive Interactions Using Information Gain and Bayesian Network Scoring.

PLoS One. 2015 Dec 1;10(12):e0143247. doi: 10.1371/journal.pone.0143247. eCollection 2015.

Mining pure, strict epistatic interactions from high-dimensional datasets: ameliorating the curse of dimensionality.

PLoS One. 2012;7(10):e46771. doi: 10.1371/journal.pone.0046771. Epub 2012 Oct 12.

An algorithm for direct causal learning of influences on patient outcomes.

Artif Intell Med. 2017 Jan;75:1-15. doi: 10.1016/j.artmed.2016.10.003. Epub 2016 Nov 5.

LEAP: biomarker inference through learning and evaluating association patterns.

Genet Epidemiol. 2015 Mar;39(3):173-84. doi: 10.1002/gepi.21889. Epub 2015 Feb 12.

Genetic studies of complex human diseases: characterizing SNP-disease associations using Bayesian networks.

BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S14. doi: 10.1186/1752-0509-6-S3-S14. Epub 2012 Dec 17.

A fast algorithm for learning epistatic genomic relationships.

AMIA Annu Symp Proc. 2010 Nov 13;2010:341-5.

A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets.

J Am Med Inform Assoc. 2014 Oct;21(e2):e312-9. doi: 10.1136/amiajnl-2013-002358. Epub 2014 Apr 15.

bNEAT: a Bayesian network method for detecting epistatic interactions in genome-wide association studies.

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S9. doi: 10.1186/1471-2164-12-S2-S9. Epub 2011 Jul 27.

Learning genetic epistasis using Bayesian network scoring criteria.

BMC Bioinformatics. 2011 Mar 31;12:89. doi: 10.1186/1471-2105-12-89.

引用本文的文献

Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta.

Sci Rep. 2023 Aug 5;13(1):12718. doi: 10.1038/s41598-023-40036-5.

Machine Learning to Discern Interactive Clusters of Risk Factors for Late Recurrence of Metastatic Breast Cancer.

Cancers (Basel). 2022 Jan 5;14(1):253. doi: 10.3390/cancers14010253.

Synthetic data generation with probabilistic Bayesian Networks.

Math Biosci Eng. 2021 Oct 9;18(6):8603-8621. doi: 10.3934/mbe.2021426.

Genomic prediction and QTL mapping of root system architecture and above-ground agronomic traits in rice (Oryza sativa L.) with a multitrait index and Bayesian networks.

G3 (Bethesda). 2021 Sep 27;11(10). doi: 10.1093/g3journal/jkab178.

Connecting Social Psychology and Deep Reinforcement Learning: A Probabilistic Predictor on the Intention to Do Home-Based Physical Activity After Message Exposure.

Front Psychol. 2021 Jul 12;12:696770. doi: 10.3389/fpsyg.2021.696770. eCollection 2021.

Genetic interactions effects for cancer disease identification using computational models: a review.

Med Biol Eng Comput. 2021 Apr;59(4):733-758. doi: 10.1007/s11517-021-02343-9. Epub 2021 Apr 11.

A Belief Degree-Associated Fuzzy Multifactor Dimensionality Reduction Framework for Epistasis Detection.

Methods Mol Biol. 2021;2212:307-323. doi: 10.1007/978-1-0716-0947-7_19.

Prediction models for acute kidney injury in patients with gastrointestinal cancers: a real-world study based on Bayesian networks.

Ren Fail. 2020 Nov;42(1):869-876. doi: 10.1080/0886022X.2020.1810068.

Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis.

BMC Bioinformatics. 2020 Jul 10;21(1):298. doi: 10.1186/s12859-020-03638-8.

A novel machine learning algorithm, Bayesian networks model, to predict the high-risk patients with cardiac surgery-associated acute kidney injury.

Clin Cardiol. 2020 Jul;43(7):752-761. doi: 10.1002/clc.23377. Epub 2020 May 12.

本文引用的文献

Learning Predictive Interactions Using Information Gain and Bayesian Network Scoring.

PLoS One. 2015 Dec 1;10(12):e0143247. doi: 10.1371/journal.pone.0143247. eCollection 2015.

Visualizing collaborative electronic health record usage for hospitalized patients with heart failure.

J Am Med Inform Assoc. 2015 Mar;22(2):299-311. doi: 10.1093/jamia/ocu017. Epub 2015 Feb 20.

LEAP: biomarker inference through learning and evaluating association patterns.

Genet Epidemiol. 2015 Mar;39(3):173-84. doi: 10.1002/gepi.21889. Epub 2015 Feb 12.

A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection.

BioData Min. 2014 Jun 9;7:8. doi: 10.1186/1756-0381-7-8. eCollection 2014.

GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures.

BioData Min. 2012 Oct 1;5(1):16. doi: 10.1186/1756-0381-5-16.

The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.

Nature. 2012 Apr 18;486(7403):346-52. doi: 10.1038/nature10983.

A bayesian method for evaluating and discovering disease loci associations.

PLoS One. 2011;6(8):e22075. doi: 10.1371/journal.pone.0022075. Epub 2011 Aug 10.

Comparative analysis of methods for detecting interacting loci.

BMC Genomics. 2011 Jul 5;12:344. doi: 10.1186/1471-2164-12-344.

Learning genetic epistasis using Bayesian network scoring criteria.

BMC Bioinformatics. 2011 Mar 31;12:89. doi: 10.1186/1471-2105-12-89.

A fast algorithm for learning epistatic genomic relationships.

AMIA Annu Symp Proc. 2010 Nov 13;2010:341-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用贝叶斯网络评分和信息增益发现因果相互作用。

Discovering causal interactions using Bayesian network scoring and information gain.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献