• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

混合数据中潜在变量模型的可扩展因果发现策略比较

Comparison of strategies for scalable causal discovery of latent variable models from mixed data.

作者信息

Raghu Vineet K, Ramsey Joseph D, Morris Alison, Manatakis Dimitrios V, Sprites Peter, Chrysanthis Panos K, Glymour Clark, Benos Panayiotis V

机构信息

1Department of Computer Science, University of Pittsburgh, Pittsburgh, PA USA.

3Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA USA.

出版信息

Int J Data Sci Anal. 2018;6(1):33-45. doi: 10.1007/s41060-018-0104-3. Epub 2018 Feb 6.

DOI:10.1007/s41060-018-0104-3
PMID:30148202
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6096780/
Abstract

Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data ("Mixed Data"), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relationships from biomedical data; however, handling the challenges of causal inference over mixed data with unmeasured confounders in a scalable way is still an open problem. Despite recent advances into causal discovery strategies that could potentially handle these challenges; individually, no study currently exists that comprehensively compares these approaches in this setting. In this paper, we present a comparative study that addresses this problem by comparing the accuracy and efficiency of different strategies in large, mixed datasets with latent confounders. We experiment with two extensions of the Fast Causal Inference algorithm: a maximum probability search procedure we recently developed to identify causal orientations more accurately, and a strategy which quickly eliminates unlikely adjacencies in order to achieve scalability to high-dimensional data. We demonstrate that these methods significantly outperform the state of the art in the field by achieving both accurate edge orientations and tractable running time in simulation experiments on datasets with up to 500 variables. Finally, we demonstrate the usability of the best performing approach on real data by applying it to a biomedical dataset of HIV-infected individuals.

摘要

现代技术使得从患者队列中收集大规模、复杂的生物医学数据集成为可能。这些数据集由连续数据和分类数据(“混合数据”)组成,由于生物医学现象的复杂性,关键变量在这些数据中可能未被观测到。因果推断算法可以从生物医学数据中识别重要关系;然而,以可扩展的方式处理混合数据中存在未测量混杂因素时的因果推断挑战仍然是一个未解决的问题。尽管最近在因果发现策略方面取得了进展,这些策略有可能应对这些挑战;但目前还没有一项研究全面比较在这种情况下的这些方法。在本文中,我们进行了一项比较研究,通过比较不同策略在具有潜在混杂因素的大规模混合数据集中的准确性和效率来解决这个问题。我们对快速因果推断算法的两种扩展进行了实验:一种是我们最近开发的最大概率搜索过程,用于更准确地识别因果方向,另一种是快速消除不太可能的邻接关系以实现对高维数据可扩展性的策略。我们证明,在包含多达500个变量的数据集的模拟实验中,这些方法通过实现准确的边方向和易于处理的运行时间,显著优于该领域的现有技术水平。最后,我们将性能最佳的方法应用于一组HIV感染者的生物医学数据集,证明了其在真实数据上的可用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/a1ca4c5179c1/41060_2018_104_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/d6c2c273ddb8/41060_2018_104_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/9aa343c2b277/41060_2018_104_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/afd377ade02e/41060_2018_104_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/b043c6d3f9a1/41060_2018_104_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/21d057ef0412/41060_2018_104_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/def82abb5bd1/41060_2018_104_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/a1ca4c5179c1/41060_2018_104_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/d6c2c273ddb8/41060_2018_104_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/9aa343c2b277/41060_2018_104_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/afd377ade02e/41060_2018_104_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/b043c6d3f9a1/41060_2018_104_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/21d057ef0412/41060_2018_104_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/def82abb5bd1/41060_2018_104_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b4a/6096780/a1ca4c5179c1/41060_2018_104_Fig7_HTML.jpg

相似文献

1
Comparison of strategies for scalable causal discovery of latent variable models from mixed data.混合数据中潜在变量模型的可扩展因果发现策略比较
Int J Data Sci Anal. 2018;6(1):33-45. doi: 10.1007/s41060-018-0104-3. Epub 2018 Feb 6.
2
Using Domain Knowledge to Overcome Latent Variables in Causal Inference from Time Series.利用领域知识克服时间序列因果推断中的潜在变量
Proc Mach Learn Res. 2019 Aug;106:474-489.
3
An algorithm for direct causal learning of influences on patient outcomes.一种用于直接因果学习对患者预后影响的算法。
Artif Intell Med. 2017 Jan;75:1-15. doi: 10.1016/j.artmed.2016.10.003. Epub 2016 Nov 5.
4
Causal discoveries for high dimensional mixed data.高维混合数据的因果发现。
Stat Med. 2022 Oct 30;41(24):4924-4940. doi: 10.1002/sim.9544. Epub 2022 Aug 15.
5
Scalable Causal Structure Learning: Scoping Review of Traditional and Deep Learning Algorithms and New Opportunities in Biomedicine.可扩展因果结构学习:传统与深度学习算法的综述及生物医学中的新机遇
JMIR Med Inform. 2023 Jan 17;11:e38266. doi: 10.2196/38266.
6
A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs.多核 PC 上用于高维因果发现的快速 PC 算法。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1483-1495. doi: 10.1109/TCBB.2016.2591526. Epub 2016 Jul 14.
7
Causal Discovery in Linear Non-Gaussian Acyclic Model With Multiple Latent Confounders.具有多个潜在混杂因素的线性非高斯无环模型中的因果发现。
IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):2816-2827. doi: 10.1109/TNNLS.2020.3045812. Epub 2022 Jul 6.
8
Causal Discovery in High-dimensional, Multicollinear Datasets.高维、多重共线性数据集中的因果发现
Front Epidemiol. 2022;2. doi: 10.3389/fepid.2022.899655. Epub 2022 Sep 13.
9
Sophisticated Merging Over Random Partitions: A Scalable and Robust Causal Discovery Approach.基于随机划分的复杂合并:一种可扩展且稳健的因果发现方法。
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3623-3635. doi: 10.1109/TNNLS.2017.2734804. Epub 2017 Aug 24.
10
Evaluation of Causal Structure Learning Methods on Mixed Data Types.混合数据类型下因果结构学习方法的评估
Proc Mach Learn Res. 2018 Aug;92:48-65.

引用本文的文献

1
Identifying pathways to cardiovascular mortality by causal graphical models and mediation analysis among hypertensive patients: insights from a prospective study.通过因果图模型和中介分析确定高血压患者心血管死亡的途径:一项前瞻性研究的见解
J Transl Med. 2025 Jun 19;23(1):690. doi: 10.1186/s12967-025-06755-1.
2
Investigating causal networks of dementia using causal discovery and natural language processing models.使用因果发现和自然语言处理模型研究痴呆症的因果网络。
NPJ Dement. 2025;1(1):4. doi: 10.1038/s44400-025-00006-2. Epub 2025 May 9.
3
Pre-vaccination transcriptomic profiles of immune responders to the MUC1 peptide vaccine for colon cancer prevention.

本文引用的文献

1
A Hybrid Causal Search Algorithm for Latent Variable Models.一种用于潜在变量模型的混合因果搜索算法。
JMLR Workshop Conf Proc. 2016 Aug;52:368-379.
2
COPD in HIV-Infected Patients: CD4 Cell Count Highly Correlated.HIV感染患者中的慢性阻塞性肺疾病:CD4细胞计数高度相关。
PLoS One. 2017 Jan 5;12(1):e0169359. doi: 10.1371/journal.pone.0169359. eCollection 2017.
3
A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs.多核 PC 上用于高维因果发现的快速 PC 算法。
用于预防结肠癌的MUC1肽疫苗免疫应答者的接种前转录组概况。
Front Immunol. 2024 Oct 10;15:1437391. doi: 10.3389/fimmu.2024.1437391. eCollection 2024.
4
Identification of factors directly linked to incident chronic obstructive pulmonary disease: A causal graph modeling study.鉴定与新发慢性阻塞性肺疾病直接相关的因素:因果图建模研究。
PLoS Med. 2024 Aug 13;21(8):e1004444. doi: 10.1371/journal.pmed.1004444. eCollection 2024 Aug.
5
A plasma peptidomic signature reveals extracellular matrix remodeling and predicts prognosis in alcohol-associated hepatitis.血浆肽组学特征揭示了细胞外基质重塑,并预测了酒精性肝炎的预后。
Hepatol Commun. 2024 Jul 31;8(8). doi: 10.1097/HC9.0000000000000510. eCollection 2024 Aug 1.
6
Longitudinal multicompartment characterization of host-microbiota interactions in patients with acute respiratory failure.急性呼吸衰竭患者宿主-微生物群相互作用的纵向多室特征分析。
Nat Commun. 2024 Jun 3;15(1):4708. doi: 10.1038/s41467-024-48819-8.
7
Pre-vaccination transcriptomic profiles of immune responders to the MUC1 peptide vaccine for colon cancer prevention.用于预防结肠癌的MUC1肽疫苗免疫应答者的接种前转录组图谱。
medRxiv. 2024 May 10:2024.05.09.24305336. doi: 10.1101/2024.05.09.24305336.
8
Pancreatic quantitative sensory testing to predict treatment response of endoscopic therapy or surgery for painful chronic pancreatitis with pancreatic duct obstruction: study protocol for an observational clinical trial.胰腺定量感觉测试预测内镜治疗或手术治疗伴胰管梗阻的慢性胰腺炎疼痛的疗效:一项观察性临床试验的研究方案。
BMJ Open. 2024 Mar 21;14(3):e081505. doi: 10.1136/bmjopen-2023-081505.
9
Integrated BATF transcriptional network regulates suppressive intratumoral regulatory T cells.整合的 BATF 转录调控网络调节肿瘤内抑制性调节性 T 细胞。
Sci Immunol. 2023 Sep 15;8(87):eadf6717. doi: 10.1126/sciimmunol.adf6717.
10
Integrated unbiased multiomics defines disease-independent placental clusters in common obstetrical syndromes.综合无偏多组学定义了常见产科综合征中与疾病无关的胎盘簇。
BMC Med. 2023 Sep 8;21(1):349. doi: 10.1186/s12916-023-03054-8.
IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1483-1495. doi: 10.1109/TCBB.2016.2591526. Epub 2016 Jul 14.
4
Learning mixed graphical models with separate sparsity parameters and stability-based model selection.学习具有单独稀疏参数和基于稳定性的模型选择的混合图形模型。
BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):175. doi: 10.1186/s12859-016-1039-0.
5
Big Data: Astronomical or Genomical?大数据:天文学的还是基因组学的?
PLoS Biol. 2015 Jul 7;13(7):e1002195. doi: 10.1371/journal.pbio.1002195. eCollection 2015 Jul.
6
Learning the Structure of Mixed Graphical Models.学习混合图形模型的结构
J Comput Graph Stat. 2015 Jan 1;24(1):230-253. doi: 10.1080/10618600.2014.900500.
7
Lung cancer in HIV patients and their parents: a Danish cohort study.HIV 患者及其父母的肺癌:一项丹麦队列研究。
BMC Cancer. 2011 Jun 25;11:272. doi: 10.1186/1471-2407-11-272.