Kim Dong Wook
Department of Information and Statistics, Department of Bio & Medical Big Data, Research Institute of Natural Science, Gyeongsang National University, Jinju, Korea.
J Korean Med Sci. 2025 Mar 3;40(8):e110. doi: 10.3346/jkms.2025.40.e110.
The utilization of health insurance claims data has expanded significantly, enabling researchers to conduct epidemiological studies on a large scale. This review examines key statistical methods for addressing baseline differences and conducting cohort analyses using Korean National Health Insurance claims data. Propensity score matching and inverse probability of treatment weighting are widely used to mitigate selection bias and enhance causal inference in observational studies. These methods help improve study validity by balancing covariates between treatment and control groups. Additionally, survival analysis techniques, such as the Cox proportional hazards model, are essential for assessing time-to-event outcomes and estimating hazard ratios while accounting for censoring. However, the application of these statistical methods is accompanied by challenges, including unmeasured confounding, instability in weight estimation, and violations of model assumptions. To address these limitations, emerging approaches, such as Doubly robust estimation, machine learning-based causal inference, and the marginal structural model, have gained prominence. These techniques offer greater flexibility and robustness in real-world data analysis. Future research should focus on refining methodologies for integrating high-dimensional health datasets and leveraging artificial intelligence to enhance predictive modeling and causal inference. Furthermore, the expansion of international collaborations and the adoption of standardized data models will facilitate large-scale multi-center studies. Ethical considerations, including data privacy and algorithmic transparency, should also be prioritized to ensure responsible data use. Maximizing the utility of health insurance claims data requires interdisciplinary collaboration, methodological advancements, and the implementation of rigorous statistical techniques to support evidence-based healthcare policy and improve public health outcomes.
健康保险理赔数据的利用已大幅扩展,使研究人员能够大规模开展流行病学研究。本综述探讨了使用韩国国民健康保险理赔数据来解决基线差异和进行队列分析的关键统计方法。倾向得分匹配和治疗权重的逆概率在观察性研究中被广泛用于减轻选择偏倚并增强因果推断。这些方法通过平衡治疗组和对照组之间的协变量来帮助提高研究的有效性。此外,生存分析技术,如Cox比例风险模型,对于评估事件发生时间结局和估计风险比同时考虑删失情况至关重要。然而,这些统计方法的应用伴随着挑战,包括未测量的混杂因素、权重估计的不稳定性以及模型假设的违反。为解决这些局限性,诸如双重稳健估计、基于机器学习的因果推断和边际结构模型等新兴方法已受到关注。这些技术在实际数据分析中提供了更大的灵活性和稳健性。未来的研究应专注于完善整合高维健康数据集的方法,并利用人工智能来增强预测建模和因果推断。此外,国际合作的扩展和标准化数据模型的采用将促进大规模多中心研究。还应优先考虑包括数据隐私和算法透明度在内的伦理考量,以确保负责任的数据使用。最大化健康保险理赔数据的效用需要跨学科合作、方法学进步以及实施严格的统计技术,以支持基于证据的医疗政策并改善公共卫生结果。