用于临床流行病学研究的日本高维行政索赔数据分布式表示模型的开发与验证

Development and validation of a distributed representation model of Japanese high-dimensional administrative claims data for clinical epidemiology studies.

作者信息

Matsui Hiroki, Fushimi Kiyohide, Yasunaga Hideo

机构信息

Department of Clinical Epidemiology and Health Economics, School of Public Health, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 1130033, Japan.

Department of Health Policy and Informatics, Institute of Science Tokyo Graduate School of Medical and Dental Sciences, 1-5-45 Yushima, Bunkyo-Ku, Tokyo, 1138519, Japan.

出版信息

BMC Med Res Methodol. 2025 Apr 11;25(1):95. doi: 10.1186/s12874-025-02549-7.

DOI:10.1186/s12874-025-02549-7

PMID:40217149

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11987422/

Abstract

BACKGROUND

Unmeasured confounders pose challenges when observational data are analysed in comparative effectiveness studies. Integrating high-dimensional administrative claims data may help adjust for unmeasured confounders. We determined whether distributed representations can compress high-dimensional administrative claims data to adjust for unmeasured confounders.

METHOD

Using the Japanese Diagnosis Procedure Combination (DPC) database from 1291 hospitals (between April 2018 and March 2020), we applied the word2vec algorithm to create distributed representations for all medical codes. We focused on patients with heart failure (HF) and simulated four risk-adjustment models: 1, no adjustment; 2, adjusting for previously reported confounders; 3, adjusting for the sum of distributed representation weights of administrative claims data on the day of hospitalisation (novel method); and 4, a combination of models 2 and 3. We re-evaluated a previous study on the effect of early rehabilitation in patients with HF and compared these risk-adjustment methods (models 1-4).

RESULTS

Distributed representations were generated from the data of 15 998 963 in-patients, and 319 581 HF patients were identified. In the simulation study, Model 3 reduced the impact of unmeasured confounders and achieved better covariate balances than Model 1. Model 4 showed no increase in bias compared with the true model (Model 2) and was used as a reference model in the real-world application. When applied to a previous study, models 3 and 4 showed similar results.

CONCLUSION

Distributed representation can compress detailed administrative claims data and adjust for unmeasured confounders in comparative effectiveness studies.

摘要

背景

在比较效果研究中分析观察性数据时，未测量的混杂因素会带来挑战。整合高维管理索赔数据可能有助于调整未测量的混杂因素。我们确定分布式表示是否可以压缩高维管理索赔数据以调整未测量的混杂因素。

方法

使用来自1291家医院（2018年4月至2020年3月）的日本诊断程序组合（DPC）数据库，我们应用word2vec算法为所有医疗代码创建分布式表示。我们关注心力衰竭（HF）患者，并模拟了四种风险调整模型：1，不调整；2，调整先前报告的混杂因素；3，调整住院当天管理索赔数据的分布式表示权重总和（新方法）；4，模型2和3的组合。我们重新评估了先前关于HF患者早期康复效果的研究，并比较了这些风险调整方法（模型1-4）。

结果

从15998963名住院患者的数据中生成了分布式表示，共识别出319581名HF患者。在模拟研究中，模型3减少了未测量混杂因素的影响，并且比模型1实现了更好的协变量平衡。与真实模型（模型2）相比，模型4的偏差没有增加，并且在实际应用中用作参考模型。当应用于先前的研究时，模型3和4显示出相似的结果。

结论

分布式表示可以压缩详细的管理索赔数据，并在比较效果研究中调整未测量的混杂因素。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于临床流行病学研究的日本高维行政索赔数据分布式表示模型的开发与验证

Development and validation of a distributed representation model of Japanese high-dimensional administrative claims data for clinical epidemiology studies.

作者信息

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献

用于临床流行病学研究的日本高维行政索赔数据分布式表示模型的开发与验证

Development and validation of a distributed representation model of Japanese high-dimensional administrative claims data for clinical epidemiology studies.

作者信息

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献