• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器学习在改善医疗数据库研究中高维代理混杂因素调整中的应用:当前文献综述。

Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature.

机构信息

Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.

KI Research Institute, Kfar Malal, Israel.

出版信息

Pharmacoepidemiol Drug Saf. 2022 Sep;31(9):932-943. doi: 10.1002/pds.5500. Epub 2022 Jul 5.

DOI:10.1002/pds.5500
PMID:35729705
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9541861/
Abstract

PURPOSE

Supplementing investigator-specified variables with large numbers of empirically identified features that collectively serve as 'proxies' for unspecified or unmeasured factors can often improve confounding control in studies utilizing administrative healthcare databases. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic research. In this paper, we survey current approaches and recent advancements for high-dimensional proxy confounder adjustment in healthcare database studies.

METHODS

We discuss considerations underpinning three areas for high-dimensional proxy confounder adjustment: (1) feature generation-transforming raw data into covariates (or features) to be used for proxy adjustment; (2) covariate prioritization, selection, and adjustment; and (3) diagnostic assessment. We discuss challenges and avenues of future development within each area.

RESULTS

There is a large literature on methods for high-dimensional confounder prioritization/selection, but relatively little has been written on best practices for feature generation and diagnostic assessment. Consequently, these areas have particular limitations and challenges.

CONCLUSIONS

There is a growing body of evidence showing that machine-learning algorithms for high-dimensional proxy-confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. However, more research is needed on best practices for feature generation and diagnostic assessment when applying methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic studies.

摘要

目的

在利用医疗保健管理数据库进行的研究中,通过补充大量经验证的、可作为未指定或未测量因素“代理”的实证确定特征,可以改善混杂因素的控制。因此,最近人们关注的焦点是开发用于药物流行病学研究中高维代理混杂因素调整的基于数据的方法。本文综述了医疗保健数据库研究中高维代理混杂因素调整的现有方法和最新进展。

方法

我们讨论了高维代理混杂因素调整的三个方面的基本考虑因素:(1)特征生成——将原始数据转换为用于代理调整的协变量(或特征);(2)协变量优先级、选择和调整;(3)诊断评估。我们讨论了每个领域内的挑战和未来发展方向。

结果

虽然有大量关于高维混杂因素优先级/选择方法的文献,但关于特征生成和诊断评估的最佳实践却相对较少。因此,这些领域存在特定的限制和挑战。

结论

越来越多的证据表明,用于高维代理混杂因素调整的机器学习算法可以补充研究者指定的变量,与仅基于研究者指定的变量相比,改善混杂因素的控制。然而,在药物流行病学研究中应用高维代理混杂因素调整方法时,需要更多关于特征生成和诊断评估最佳实践的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0417/9541861/1c777a48c311/PDS-31-932-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0417/9541861/7cbeea32441c/PDS-31-932-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0417/9541861/98197fc39c4a/PDS-31-932-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0417/9541861/1c777a48c311/PDS-31-932-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0417/9541861/7cbeea32441c/PDS-31-932-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0417/9541861/98197fc39c4a/PDS-31-932-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0417/9541861/1c777a48c311/PDS-31-932-g002.jpg

相似文献

1
Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature.机器学习在改善医疗数据库研究中高维代理混杂因素调整中的应用:当前文献综述。
Pharmacoepidemiol Drug Saf. 2022 Sep;31(9):932-943. doi: 10.1002/pds.5500. Epub 2022 Jul 5.
2
A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases.利用大型医疗保健数据库估计因果效应的混杂因素选择和调整方法比较。
Pharmacoepidemiol Drug Saf. 2022 Apr;31(4):424-433. doi: 10.1002/pds.5403. Epub 2022 Jan 7.
3
On the role of marginal confounder prevalence - implications for the high-dimensional propensity score algorithm.论边缘混杂因素患病率的作用——对高维倾向评分算法的影响
Pharmacoepidemiol Drug Saf. 2015 Sep;24(9):1004-7. doi: 10.1002/pds.3773. Epub 2015 Apr 10.
4
High-dimensional propensity score adjustment in studies of treatment effects using health care claims data.使用医疗保健理赔数据进行治疗效果研究中的高维倾向得分调整
Epidemiology. 2009 Jul;20(4):512-22. doi: 10.1097/EDE.0b013e3181a663cc.
5
Principles of confounder selection.混杂因素选择原则。
Eur J Epidemiol. 2019 Mar;34(3):211-219. doi: 10.1007/s10654-019-00494-6. Epub 2019 Mar 6.
6
Transparency of high-dimensional propensity score analyses: Guidance for diagnostics and reporting.高维倾向评分分析的透明度:诊断和报告指南。
Pharmacoepidemiol Drug Saf. 2022 Apr;31(4):411-423. doi: 10.1002/pds.5412. Epub 2022 Feb 12.
7
Measuring frailty using claims data for pharmacoepidemiologic studies of mortality in older adults: evidence and recommendations.利用索赔数据测量老年人死亡率药物流行病学研究中的衰弱:证据与建议。
Pharmacoepidemiol Drug Saf. 2014 Sep;23(9):891-901. doi: 10.1002/pds.3674. Epub 2014 Jun 24.
8
Studies with many covariates and few outcomes: selecting covariates and implementing propensity-score-based confounding adjustments.具有众多协变量和较少结局的研究:选择协变量并实施基于倾向评分的混杂调整。
Epidemiology. 2014 Mar;25(2):268-78. doi: 10.1097/EDE.0000000000000069.
9
High-dimensional propensity scores for empirical covariate selection in secondary database studies: Planning, implementation, and reporting.高维倾向得分在二次数据库研究中对经验协变量选择的应用:规划、实施和报告。
Pharmacoepidemiol Drug Saf. 2023 Feb;32(2):93-106. doi: 10.1002/pds.5566. Epub 2022 Nov 22.
10
Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States.美国东部地区遥感气溶胶光学厚度与PM2.5之间关系的评估及统计建模
Res Rep Health Eff Inst. 2012 May(167):5-83; discussion 85-91.

引用本文的文献

1
High-Dimensional Disease Risk Score for Dealing With Residual Confounding Bias in Estimating Treatment Effects With a Survival Outcome.用于处理生存结局估计治疗效果时残余混杂偏倚的高维疾病风险评分
Pharmacoepidemiol Drug Saf. 2025 Jul;34(7):e70172. doi: 10.1002/pds.70172.
2
Is there a competitive advantage to using multivariate statistical or machine learning methods over the Bross formula in the hdPS framework for bias and variance estimation?在hdPS框架中进行偏差和方差估计时,相较于布罗斯公式,使用多元统计或机器学习方法是否具有竞争优势?
PLoS One. 2025 May 28;20(5):e0324639. doi: 10.1371/journal.pone.0324639. eCollection 2025.
3

本文引用的文献

1
Are E-values too optimistic or too pessimistic? Both and neither!E值是过于乐观还是过于悲观?两者皆是,又两者皆非!
Int J Epidemiol. 2022 May 9;51(2):355-363. doi: 10.1093/ije/dyac018.
2
Are Greenland, Ioannidis and Poole opposed to the Cornfield conditions? A defence of the E-value.格陵兰、约阿尼迪斯和普尔是否反对科菲尔德条件?对E值的辩护。
Int J Epidemiol. 2022 May 9;51(2):364-371. doi: 10.1093/ije/dyab218.
3
Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms.使用机器学习算法获取有效因果效应估计值面临的挑战。
Efficacy and safety of radix as an adjuvant therapy for type 2 diabetes mellitus: rationale, design and protocol for a randomised controlled trial.
黄芪作为2型糖尿病辅助治疗的疗效与安全性:一项随机对照试验的理论依据、设计与方案
BMJ Open. 2025 May 24;15(5):e092050. doi: 10.1136/bmjopen-2024-092050.
4
How Effective Are Machine Learning and Doubly Robust Estimators in Incorporating High-Dimensional Proxies to Reduce Residual Confounding?在纳入高维代理变量以减少残余混杂方面,机器学习和双重稳健估计器的效果如何?
Pharmacoepidemiol Drug Saf. 2025 May;34(5):e70155. doi: 10.1002/pds.70155.
5
Development and validation of a distributed representation model of Japanese high-dimensional administrative claims data for clinical epidemiology studies.用于临床流行病学研究的日本高维行政索赔数据分布式表示模型的开发与验证
BMC Med Res Methodol. 2025 Apr 11;25(1):95. doi: 10.1186/s12874-025-02549-7.
6
Natural language processing for scalable feature engineering and ultra-high-dimensional confounding adjustment in healthcare database studies.医疗数据库研究中用于可扩展特征工程和超高维混杂因素调整的自然语言处理
medRxiv. 2025 Jan 31:2025.01.30.25321403. doi: 10.1101/2025.01.30.25321403.
7
How much can we save by applying artificial intelligence in evidence synthesis? Results from a pragmatic review to quantify workload efficiencies and cost savings.在证据综合中应用人工智能能节省多少成本?一项务实性综述的结果,用于量化工作量效率和成本节省情况。
Front Pharmacol. 2025 Jan 31;16:1454245. doi: 10.3389/fphar.2025.1454245. eCollection 2025.
8
Computational Approaches for Connecting Maternal Stress to Preterm Birth.计算方法将母体应激与早产联系起来。
Clin Perinatol. 2024 Jun;51(2):345-360. doi: 10.1016/j.clp.2024.02.003. Epub 2024 Mar 15.
9
Combining Super Learner with high-dimensional propensity score to improve confounding adjustment: A real-world application in chronic lymphocytic leukemia.结合超级学习者和高维倾向评分提高混杂调整效果:慢性淋巴细胞白血病真实世界研究中的应用。
Pharmacoepidemiol Drug Saf. 2024 Jan;33(1):e5678. doi: 10.1002/pds.5678. Epub 2023 Aug 23.
Am J Epidemiol. 2023 Sep 1;192(9). doi: 10.1093/aje/kwab201. Epub 2021 Jul 15.
4
Framework for identifying drug repurposing candidates from observational healthcare data.从观察性医疗保健数据中识别药物重新利用候选药物的框架。
JAMIA Open. 2020 Dec 31;3(4):536-544. doi: 10.1093/jamiaopen/ooaa048. eCollection 2020 Dec.
5
Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.机器学习在因果推断中的应用:基于交叉拟合估计量的研究。
Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.
6
Deep Learning-based Propensity Scores for Confounding Control in Comparative Effectiveness Research: A Large-scale, Real-world Data Study.基于深度学习的混杂控制倾向评分在比较有效性研究中的应用:一项大规模的真实世界数据研究。
Epidemiology. 2021 May 1;32(3):378-388. doi: 10.1097/EDE.0000000000001338.
7
Common Problems, Common Data Model Solutions: Evidence Generation for Health Technology Assessment.常见问题,通用数据模型解决方案:用于卫生技术评估的证据生成。
Pharmacoeconomics. 2021 Mar;39(3):275-285. doi: 10.1007/s40273-020-00981-9. Epub 2020 Dec 18.
8
Framework for the synthesis of non-randomised studies and randomised controlled trials: a guidance on conducting a systematic review and meta-analysis for healthcare decision making.非随机研究和随机对照试验综合框架:为医疗保健决策进行系统评价和荟萃分析的指南。
BMJ Evid Based Med. 2022 Apr;27(2):109-119. doi: 10.1136/bmjebm-2020-111493. Epub 2020 Dec 9.
9
Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods.使用超级学习器和高维倾向评分方法对电子医疗数据库进行倾向评分预测。
J Appl Stat. 2019;46(12):2216-2236. doi: 10.1080/02664763.2019.1582614. Epub 2019 Feb 22.
10
Commentary: An argument against E-values for assessing the plausibility that an association could be explained away by residual confounding.评论:反对使用E值来评估关联是否可能被残余混杂因素解释掉的合理性。
Int J Epidemiol. 2020 Oct 1;49(5):1501-1503. doi: 10.1093/ije/dyaa095.