• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

贝叶斯网络方法结合缺失数据插补可实现复杂因果生物学关系的探索性分析。

A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships.

机构信息

Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom.

Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, United Kingdom.

出版信息

PLoS Genet. 2021 Sep 29;17(9):e1009811. doi: 10.1371/journal.pgen.1009811. eCollection 2021 Sep.

DOI:10.1371/journal.pgen.1009811
PMID:34587167
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8504979/
Abstract

Bayesian networks can be used to identify possible causal relationships between variables based on their conditional dependencies and independencies, which can be particularly useful in complex biological scenarios with many measured variables. Here we propose two improvements to an existing method for Bayesian network analysis, designed to increase the power to detect potential causal relationships between variables (including potentially a mixture of both discrete and continuous variables). Our first improvement relates to the treatment of missing data. When there is missing data, the standard approach is to remove every individual with any missing data before performing analysis. This can be wasteful and undesirable when there are many individuals with missing data, perhaps with only one or a few variables missing. This motivates the use of imputation. We present a new imputation method that uses a version of nearest neighbour imputation, whereby missing data from one individual is replaced with data from another individual, their nearest neighbour. For each individual with missing data, the subsets of variables to be used to select the nearest neighbour are chosen by sampling without replacement the complete data and estimating a best fit Bayesian network. We show that this approach leads to marked improvements in the recall and precision of directed edges in the final network identified, and we illustrate the approach through application to data from a recent study investigating the causal relationship between methylation and gene expression in early inflammatory arthritis patients. We also describe a second improvement in the form of a pseudo-Bayesian approach for upweighting certain network edges, which can be useful when there is prior evidence concerning their directions.

摘要

贝叶斯网络可用于根据变量的条件依赖和独立性,识别变量之间可能存在的因果关系。这在具有许多测量变量的复杂生物学场景中特别有用。在这里,我们提出了对现有贝叶斯网络分析方法的两项改进,旨在提高检测变量之间潜在因果关系的能力(包括离散和连续变量的混合)。我们的第一项改进涉及缺失数据的处理。当存在缺失数据时,标准方法是在进行分析之前,删除每个存在缺失数据的个体。当存在许多缺失数据的个体时,这可能是浪费且不理想的,也许只有一个或几个变量缺失。这促使我们使用插补法。我们提出了一种新的插补方法,该方法使用了一种最近邻插补的版本,即通过用另一个个体的数据替换一个个体的缺失数据,该个体是他们的最近邻。对于每个存在缺失数据的个体,选择最近邻的变量子集是通过无放回抽样完整数据并估计最佳拟合贝叶斯网络来完成的。我们表明,这种方法可显著提高最终网络中定向边的召回率和准确率,并通过应用于最近一项研究的数据来说明该方法,该研究旨在调查早期炎症性关节炎患者中甲基化和基因表达之间的因果关系。我们还描述了第二种改进方法,即伪贝叶斯方法,用于对某些网络边缘进行加权处理,当存在有关其方向的先验证据时,这种方法很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/b27c69433ab9/pgen.1009811.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/8b499723f50f/pgen.1009811.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/ba92f30fb256/pgen.1009811.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/334366da5aa7/pgen.1009811.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/5656dfccc694/pgen.1009811.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/26d52acdaf07/pgen.1009811.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/4d45a8040d64/pgen.1009811.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/b27c69433ab9/pgen.1009811.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/8b499723f50f/pgen.1009811.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/ba92f30fb256/pgen.1009811.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/334366da5aa7/pgen.1009811.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/5656dfccc694/pgen.1009811.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/26d52acdaf07/pgen.1009811.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/4d45a8040d64/pgen.1009811.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b387/8504979/b27c69433ab9/pgen.1009811.g007.jpg

相似文献

1
A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships.贝叶斯网络方法结合缺失数据插补可实现复杂因果生物学关系的探索性分析。
PLoS Genet. 2021 Sep 29;17(9):e1009811. doi: 10.1371/journal.pgen.1009811. eCollection 2021 Sep.
2
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.考虑由于非随机缺失结局数据导致的偏倚:两种概率性偏倚分析方法的比较和说明:一项模拟研究。
BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4.
3
Treatment of missing data in Bayesian network structure learning: an application to linked biomedical and social survey data.贝叶斯网络结构学习中缺失数据的处理:在链接生物医学和社会调查数据中的应用。
BMC Med Res Methodol. 2022 Dec 19;22(1):326. doi: 10.1186/s12874-022-01781-9.
4
3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics.3off2:一种基于两点和三点信息统计的网络重建算法。
BMC Bioinformatics. 2016 Jan 20;17 Suppl 2(Suppl 2):12. doi: 10.1186/s12859-015-0856-x.
5
Bayesian network analysis incorporating genetic anchors complements conventional Mendelian randomization approaches for exploratory analysis of causal relationships in complex data.贝叶斯网络分析结合遗传锚点可补充传统孟德尔随机化方法,用于复杂数据中因果关系的探索性分析。
PLoS Genet. 2020 Mar 2;16(3):e1008198. doi: 10.1371/journal.pgen.1008198. eCollection 2020 Mar.
6
Multiple imputation for longitudinal data using Bayesian lasso imputation model.基于贝叶斯套索插补模型的纵向数据多重插补方法
Stat Med. 2022 Mar 15;41(6):1042-1058. doi: 10.1002/sim.9315. Epub 2022 Jan 21.
7
Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods.应对老年健康监测中的数据缺失挑战:统计与机器学习插补方法研究
Sensors (Basel). 2025 Jan 21;25(3):614. doi: 10.3390/s25030614.
8
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补:一种用于微阵列数据的新型稳健缺失值估计算法。
Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.
9
Network analysis to evaluate complexities in relationships among fermentation variables measured within continuous culture experiments.网络分析评估连续培养实验中测量的发酵变量之间关系的复杂性。
J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad085.
10
Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach.流行病学研究中处理协变量缺失的问题:多重填补法与全贝叶斯方法的比较
Stat Med. 2016 Jul 30;35(17):2955-74. doi: 10.1002/sim.6944. Epub 2016 Apr 4.

引用本文的文献

1
Bayesian network imputation methods applied to multi-omics data identify putative causal relationships in a type 2 diabetes dataset containing incomplete data: An IMI DIRECT Study.应用于多组学数据的贝叶斯网络插补方法在一个包含不完整数据的2型糖尿病数据集中识别潜在因果关系:IMI DIRECT研究。
PLoS Genet. 2025 Jul 15;21(7):e1011776. doi: 10.1371/journal.pgen.1011776. eCollection 2025 Jul.
2
Relationship between insulin resistance surrogate markers with diabetes and dyslipidemia: A Bayesian network analysis of Korean adults.胰岛素抵抗替代指标与糖尿病和血脂异常的关系:韩国成年人的贝叶斯网络分析
PLoS One. 2025 May 8;20(5):e0323329. doi: 10.1371/journal.pone.0323329. eCollection 2025.
3

本文引用的文献

1
Bayesian network analysis incorporating genetic anchors complements conventional Mendelian randomization approaches for exploratory analysis of causal relationships in complex data.贝叶斯网络分析结合遗传锚点可补充传统孟德尔随机化方法,用于复杂数据中因果关系的探索性分析。
PLoS Genet. 2020 Mar 2;16(3):e1008198. doi: 10.1371/journal.pgen.1008198. eCollection 2020 Mar.
2
Lymphocyte DNA methylation mediates genetic risk at shared immune-mediated disease loci.淋巴细胞 DNA 甲基化介导共享免疫介导疾病位点的遗传风险。
J Allergy Clin Immunol. 2020 May;145(5):1438-1451. doi: 10.1016/j.jaci.2019.12.910. Epub 2020 Jan 13.
3
Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods.
应对老年健康监测中的数据缺失挑战:统计与机器学习插补方法研究
Sensors (Basel). 2025 Jan 21;25(3):614. doi: 10.3390/s25030614.
4
DNA Methylation Changes and Phenotypic Adaptations Induced Repeated Extreme Altitude Exposure at 8848 Meters.8848米反复极端海拔暴露诱导的DNA甲基化变化及表型适应
Int J Mol Sci. 2024 Nov 25;25(23):12652. doi: 10.3390/ijms252312652.
5
Progress on network modeling and analysis of gut microecology: a review.肠道微生物网络建模与分析研究进展:综述
Appl Environ Microbiol. 2024 Mar 20;90(3):e0009224. doi: 10.1128/aem.00092-24. Epub 2024 Feb 28.
6
Comparison of regmed and BayesNetty for exploring causal models with many variables.比较 regmed 和 BayesNetty 用于探索具有多个变量的因果模型。
Genet Epidemiol. 2023 Oct;47(7):496-502. doi: 10.1002/gepi.22532. Epub 2023 Jun 27.
7
Missing data in multi-omics integration: Recent advances through artificial intelligence.多组学整合中的缺失数据:通过人工智能取得的最新进展
Front Artif Intell. 2023 Feb 9;6:1098308. doi: 10.3389/frai.2023.1098308. eCollection 2023.
8
Inferencing Bulk Tumor and Single-Cell Multi-Omics Regulatory Networks for Discovery of Biomarkers and Therapeutic Targets.推断肿瘤组织和单细胞多组学调控网络,以发现生物标志物和治疗靶点。
Cells. 2022 Dec 26;12(1):101. doi: 10.3390/cells12010101.
9
A review of causal discovery methods for molecular network analysis.分子网络分析的因果发现方法综述。
Mol Genet Genomic Med. 2022 Oct;10(10):e2055. doi: 10.1002/mgg3.2055. Epub 2022 Sep 10.
10
Confirmation of the superior performance of the causal Graphical Analysis Using Genetics (cGAUGE) pipeline in comparison to various competing alternatives.与各种竞争方案相比,因果关系遗传学图形分析(cGAUGE)流程卓越性能的确认。
Wellcome Open Res. 2022 Jul 5;7:180. doi: 10.12688/wellcomeopenres.17991.1. eCollection 2022.
Causal modeling in a multi-omic setting: insights from GAW20.
多组学环境下的因果建模:来自遗传分析研讨会20的见解
BMC Genet. 2018 Sep 17;19(Suppl 1):74. doi: 10.1186/s12863-018-0645-4.
4
A comparison of methods for inferring causal relationships between genotype and phenotype using additional biological measurements.使用额外生物学测量推断基因型与表型之间因果关系的方法比较。
Genet Epidemiol. 2017 Nov;41(7):577-586. doi: 10.1002/gepi.22061. Epub 2017 Jul 10.
5
Untangling the role of one-carbon metabolism in colorectal cancer risk: a comprehensive Bayesian network analysis.解析一碳代谢在结直肠癌风险中的作用:全面的贝叶斯网络分析。
Sci Rep. 2017 Feb 24;7:43434. doi: 10.1038/srep43434.
6
Detecting the potential cancer association or metastasis by multi-omics data analysis.通过多组学数据分析检测潜在的癌症关联或转移。
Genet Mol Res. 2016 Aug 19;15(3):gmr8987. doi: 10.4238/gmr.15038987.
7
Nearest neighbor imputation algorithms: a critical evaluation.最近邻插补算法:批判性评估
BMC Med Inform Decis Mak. 2016 Jul 25;16 Suppl 3(Suppl 3):74. doi: 10.1186/s12911-016-0318-z.
8
Identifying significant edges in graphical models of molecular networks.识别分子网络图形模型中的显著边缘。
Artif Intell Med. 2013 Mar;57(3):207-17. doi: 10.1016/j.artmed.2012.12.006. Epub 2013 Feb 8.
9
Multiple imputation by chained equations: what is it and how does it work?多重链结方程插补法:是什么,以及它如何运作?
Int J Methods Psychiatr Res. 2011 Mar;20(1):40-9. doi: 10.1002/mpr.329.
10
Disentangling molecular relationships with a causal inference test.通过因果推断测试理清分子关系。
BMC Genet. 2009 May 27;10:23. doi: 10.1186/1471-2156-10-23.