• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于临床流行病学研究的日本高维行政索赔数据分布式表示模型的开发与验证

Development and validation of a distributed representation model of Japanese high-dimensional administrative claims data for clinical epidemiology studies.

作者信息

Matsui Hiroki, Fushimi Kiyohide, Yasunaga Hideo

机构信息

Department of Clinical Epidemiology and Health Economics, School of Public Health, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 1130033, Japan.

Department of Health Policy and Informatics, Institute of Science Tokyo Graduate School of Medical and Dental Sciences, 1-5-45 Yushima, Bunkyo-Ku, Tokyo, 1138519, Japan.

出版信息

BMC Med Res Methodol. 2025 Apr 11;25(1):95. doi: 10.1186/s12874-025-02549-7.

DOI:10.1186/s12874-025-02549-7
PMID:40217149
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11987422/
Abstract

BACKGROUND

Unmeasured confounders pose challenges when observational data are analysed in comparative effectiveness studies. Integrating high-dimensional administrative claims data may help adjust for unmeasured confounders. We determined whether distributed representations can compress high-dimensional administrative claims data to adjust for unmeasured confounders.

METHOD

Using the Japanese Diagnosis Procedure Combination (DPC) database from 1291 hospitals (between April 2018 and March 2020), we applied the word2vec algorithm to create distributed representations for all medical codes. We focused on patients with heart failure (HF) and simulated four risk-adjustment models: 1, no adjustment; 2, adjusting for previously reported confounders; 3, adjusting for the sum of distributed representation weights of administrative claims data on the day of hospitalisation (novel method); and 4, a combination of models 2 and 3. We re-evaluated a previous study on the effect of early rehabilitation in patients with HF and compared these risk-adjustment methods (models 1-4).

RESULTS

Distributed representations were generated from the data of 15 998 963 in-patients, and 319 581 HF patients were identified. In the simulation study, Model 3 reduced the impact of unmeasured confounders and achieved better covariate balances than Model 1. Model 4 showed no increase in bias compared with the true model (Model 2) and was used as a reference model in the real-world application. When applied to a previous study, models 3 and 4 showed similar results.

CONCLUSION

Distributed representation can compress detailed administrative claims data and adjust for unmeasured confounders in comparative effectiveness studies.

摘要

背景

在比较效果研究中分析观察性数据时,未测量的混杂因素会带来挑战。整合高维管理索赔数据可能有助于调整未测量的混杂因素。我们确定分布式表示是否可以压缩高维管理索赔数据以调整未测量的混杂因素。

方法

使用来自1291家医院(2018年4月至2020年3月)的日本诊断程序组合(DPC)数据库,我们应用word2vec算法为所有医疗代码创建分布式表示。我们关注心力衰竭(HF)患者,并模拟了四种风险调整模型:1,不调整;2,调整先前报告的混杂因素;3,调整住院当天管理索赔数据的分布式表示权重总和(新方法);4,模型2和3的组合。我们重新评估了先前关于HF患者早期康复效果的研究,并比较了这些风险调整方法(模型1-4)。

结果

从15998963名住院患者的数据中生成了分布式表示,共识别出319581名HF患者。在模拟研究中,模型3减少了未测量混杂因素的影响,并且比模型1实现了更好的协变量平衡。与真实模型(模型2)相比,模型4的偏差没有增加,并且在实际应用中用作参考模型。当应用于先前的研究时,模型3和4显示出相似的结果。

结论

分布式表示可以压缩详细的管理索赔数据,并在比较效果研究中调整未测量的混杂因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/8578e9103c6a/12874_2025_2549_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/610f84ec2c83/12874_2025_2549_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/c3a957a412b7/12874_2025_2549_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/642d63454547/12874_2025_2549_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/46fb36af61b2/12874_2025_2549_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/8578e9103c6a/12874_2025_2549_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/610f84ec2c83/12874_2025_2549_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/c3a957a412b7/12874_2025_2549_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/642d63454547/12874_2025_2549_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/46fb36af61b2/12874_2025_2549_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5279/11987422/8578e9103c6a/12874_2025_2549_Fig5_HTML.jpg

相似文献

1
Development and validation of a distributed representation model of Japanese high-dimensional administrative claims data for clinical epidemiology studies.用于临床流行病学研究的日本高维行政索赔数据分布式表示模型的开发与验证
BMC Med Res Methodol. 2025 Apr 11;25(1):95. doi: 10.1186/s12874-025-02549-7.
2
Stroke Research Using Administrative Claims Database in Japan: A Narrative Review.日本利用行政索赔数据库进行的卒中研究:一项叙述性综述。
J Atheroscler Thromb. 2024 Oct 1;31(10):1341-1352. doi: 10.5551/jat.RV22022. Epub 2024 Aug 3.
3
Comparing the high-dimensional propensity score for use with administrative data with propensity scores derived from high-quality clinical data.比较使用行政数据的高维倾向评分与从高质量临床数据得出的倾向评分。
Stat Methods Med Res. 2020 Feb;29(2):568-588. doi: 10.1177/0962280219842362. Epub 2019 Apr 11.
4
Evaluating the impact of unmeasured confounding with internal validation data: an example cost evaluation in type 2 diabetes.利用内部验证数据评估未测量混杂的影响:2 型糖尿病成本评估实例。
Value Health. 2013 Mar-Apr;16(2):259-66. doi: 10.1016/j.jval.2012.10.012. Epub 2013 Jan 23.
5
A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases.利用大型医疗保健数据库估计因果效应的混杂因素选择和调整方法比较。
Pharmacoepidemiol Drug Saf. 2022 Apr;31(4):424-433. doi: 10.1002/pds.5403. Epub 2022 Jan 7.
6
Zostavax vaccine effectiveness among US elderly using real-world evidence: Addressing unmeasured confounders by using multiple imputation after linking beneficiary surveys with Medicare claims.使用真实世界证据评估美国老年人中Zostavax疫苗的有效性:通过将受益人的调查与医疗保险理赔数据相链接后采用多重填补法来处理未测量的混杂因素。
Pharmacoepidemiol Drug Saf. 2019 Jul;28(7):993-1001. doi: 10.1002/pds.4801. Epub 2019 Jun 5.
7
A systematic review of validated methods for identifying heart failure using administrative data.使用行政数据识别心力衰竭的验证方法的系统评价。
Pharmacoepidemiol Drug Saf. 2012 Jan;21 Suppl 1(0 1):129-40. doi: 10.1002/pds.2313.
8
Characteristics and outcomes of heart failure in Japan: A hospital-based administrative database analysis.日本心力衰竭的特征与结局:一项基于医院行政数据库的分析。
ESC Heart Fail. 2024 Dec;11(6):4360-4370. doi: 10.1002/ehf2.15018. Epub 2024 Sep 3.
9
Improving measurement of binary covariates in claims data: A simulation study.改进索赔数据中二进制协变量的测量:一项模拟研究。
Pharmacoepidemiol Drug Saf. 2020 Sep;29(9):1093-1100. doi: 10.1002/pds.4961. Epub 2020 Jan 23.
10
High-dimensional propensity score adjustment in studies of treatment effects using health care claims data.使用医疗保健理赔数据进行治疗效果研究中的高维倾向得分调整
Epidemiology. 2009 Jul;20(4):512-22. doi: 10.1097/EDE.0b013e3181a663cc.

本文引用的文献

1
Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature.机器学习在改善医疗数据库研究中高维代理混杂因素调整中的应用:当前文献综述。
Pharmacoepidemiol Drug Saf. 2022 Sep;31(9):932-943. doi: 10.1002/pds.5500. Epub 2022 Jul 5.
2
Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis.将电子健康记录嵌入知识网络可识别多发性硬化症的前驱特征并预测诊断。
J Am Med Inform Assoc. 2022 Jan 29;29(3):424-434. doi: 10.1093/jamia/ocab270.
3
Acute-phase initiation of cardiac rehabilitation and clinical outcomes in hospitalized patients for acute heart failure.
急性心力衰竭住院患者心脏康复的急性期启动与临床结局。
Int J Cardiol. 2021 Oct 1;340:36-41. doi: 10.1016/j.ijcard.2021.08.041. Epub 2021 Aug 27.
4
Sodium-glucose cotransporter-2 inhibitors and the risk of urinary tract infection among diabetic patients in Japan: Target trial emulation using a nationwide administrative claims database.钠-葡萄糖共转运蛋白 2 抑制剂与日本糖尿病患者尿路感染风险:基于全国行政索赔数据库的目标试验模拟
Diabetes Obes Metab. 2021 Jun;23(6):1379-1388. doi: 10.1111/dom.14353. Epub 2021 Mar 8.
5
Deep Learning-based Propensity Scores for Confounding Control in Comparative Effectiveness Research: A Large-scale, Real-world Data Study.基于深度学习的混杂控制倾向评分在比较有效性研究中的应用:一项大规模的真实世界数据研究。
Epidemiology. 2021 May 1;32(3):378-388. doi: 10.1097/EDE.0000000000001338.
6
Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.从海量多模态医学数据中学习的临床概念嵌入。
Pac Symp Biocomput. 2020;25:295-306.
7
Integrating biomedical research and electronic health records to create knowledge-based biologically meaningful machine-readable embeddings.将生物医学研究和电子健康记录相结合,创建基于知识的具有生物学意义的可机读嵌入式。
Nat Commun. 2019 Jul 10;10(1):3045. doi: 10.1038/s41467-019-11069-0.
8
Prediction Accuracy With Electronic Medical Records Versus Administrative Claims.电子病历与行政索赔的预测准确性。
Med Care. 2019 Jul;57(7):551-559. doi: 10.1097/MLR.0000000000001135.
9
Preoperative oral care and effect on postoperative complications after major cancer surgery.术前口腔护理对大型癌症手术后术后并发症的影响。
Br J Surg. 2018 Nov;105(12):1688-1696. doi: 10.1002/bjs.10915. Epub 2018 Aug 8.
10
Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects.用于研究因果治疗效果的电子医疗数据的自动化数据自适应分析。
Clin Epidemiol. 2018 Jul 6;10:771-788. doi: 10.2147/CLEP.S166545. eCollection 2018.