Suppr超能文献

DisCo-HD:用于分析真实世界高维数据的带协变量偏移的分布式因果推断

DisCo-HD: Distributed causal inference with covariates shift for analyzing real-world high-dimensional data.

作者信息

Tong Jiayi, Hu Jie, Hripcsak George, Ning Yang, Chen Yong

机构信息

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.

Department of Biomedical Informatics, Columbia University, New York, NY 10027, USA.

出版信息

J Mach Learn Res. 2025;26.

Abstract

High-dimensional healthcare data, such as electronic health records (EHR) data and claims data, present two primary challenges due to the large number of variables and the need to consolidate data from multiple clinical sites. The third key challenge is the potential existence of heterogeneity in terms of covariate shift. In this paper, we propose a distributed learning algorithm accounting for covariate shift to estimate the average treatment effect (ATE) for high-dimensional data, named DisCo-HD. Leveraging the surrogate likelihood method, our method calibrates the estimates of the propensity score and outcome models to approximately attain the desired covariate balancing property, while accounting for the covariate shift across multiple clinical sites. We show that our distributed covariate balancing propensity score estimator can approximate the pooled estimator, which is obtained by pooling the data from multiple sites together. The proposed estimator remains consistent if either the propensity score model or the outcome regression model is correctly specified. The semiparametric efficiency bound is achieved when both the propensity score and the outcome models are correctly specified. We conduct simulation studies to demonstrate the performance of the proposed algorithm; additionally, we apply the algorithm to a real-world data set to present the readiness of implementation and validity.

摘要

高维医疗数据,如电子健康记录(EHR)数据和理赔数据,由于变量数量众多以及需要整合来自多个临床站点的数据,带来了两个主要挑战。第三个关键挑战是协变量转移方面可能存在的异质性。在本文中,我们提出了一种考虑协变量转移的分布式学习算法,用于估计高维数据的平均治疗效果(ATE),名为DisCo-HD。利用替代似然方法,我们的方法校准倾向得分和结果模型的估计值,以近似达到所需的协变量平衡特性,同时考虑多个临床站点之间的协变量转移。我们表明,我们的分布式协变量平衡倾向得分估计器可以近似通过将多个站点的数据集中在一起获得的合并估计器。如果倾向得分模型或结果回归模型被正确指定,所提出的估计器仍然是一致的。当倾向得分和结果模型都被正确指定时,可达到半参数效率界。我们进行模拟研究以证明所提出算法的性能;此外,我们将该算法应用于一个真实世界的数据集,以展示其实施的准备情况和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd65/12269483/50142d53d588/nihms-2087339-f0001.jpg

相似文献

2
Communication-Efficient Distributed Estimation of Causal Effects With High-Dimensional Data.
Stat. 2024 Sep;13(3). doi: 10.1002/sta4.70006. Epub 2024 Sep 9.
3
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
4
Artificial intelligence for diagnosing exudative age-related macular degeneration.
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
5
Outcome adaptive propensity score methods for handling censoring and high-dimensionality: Application to insurance claims.
Stat Methods Med Res. 2025 May;34(5):847-866. doi: 10.1177/09622802241306856. Epub 2025 Feb 27.
8
Automated monitoring compared to standard care for the early detection of sepsis in critically ill patients.
Cochrane Database Syst Rev. 2018 Jun 25;6(6):CD012404. doi: 10.1002/14651858.CD012404.pub2.
10
Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.
Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.

引用本文的文献

1
Federated Target Trial Emulation using Distributed Observational Data for Treatment Effect Estimation.
medRxiv. 2025 May 5:2025.05.02.25326905. doi: 10.1101/2025.05.02.25326905.

本文引用的文献

1
Federated causal inference in heterogeneous observational data.
Stat Med. 2023 Oct 30;42(24):4418-4439. doi: 10.1002/sim.9868. Epub 2023 Aug 8.
2
The effect of SARS-CoV-2 vaccination on post-acute sequelae of COVID-19 (PASC): A prospective cohort study.
Vaccine. 2022 Jul 30;40(32):4424-4431. doi: 10.1016/j.vaccine.2022.05.090. Epub 2022 Jun 7.
3
Learning from local to global: An efficient distributed algorithm for modeling time-to-event data.
J Am Med Inform Assoc. 2020 Jul 1;27(7):1028-1036. doi: 10.1093/jamia/ocaa044.
7
DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS.
Ann Stat. 2018 Jun;46(3):1352-1382. doi: 10.1214/17-AOS1587. Epub 2018 May 3.
8
Development and Validation of the Pediatric Medical Complexity Algorithm (PMCA) Version 3.0.
Acad Pediatr. 2018 Jul;18(5):577-580. doi: 10.1016/j.acap.2018.02.010. Epub 2018 Feb 26.
10
Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality.
J Am Med Inform Assoc. 2015 Jan;22(1):179-91. doi: 10.1136/amiajnl-2014-002649. Epub 2014 Jul 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验