• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

韩国2021年至2022年用于自动死因分类的机器学习:ICD - 10预测模型的开发与验证

Machine learning for automated cause-of-death classification from 2021 to 2022 in Korea: development and validation of an ICD-10 prediction model.

作者信息

Lee Seokmin, Im Gyeongmin

机构信息

Statistics Research Institute, Statistics Korea, Daejeon, Korea.

出版信息

Ewha Med J. 2025 Jul;48(3):e45. doi: 10.12771/emj.2025.00675. Epub 2025 Jul 28.

DOI:10.12771/emj.2025.00675
PMID:40739970
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12362283/
Abstract

PURPOSE

This study evaluated the feasibility and performance of a deep learning approach utilizing the Korean Medical BERT (KM-BERT) model for the automated classification of underlying causes of death within national mortality statistics. It aimed to assess predictive accuracy throughout the cause-of-death coding workflow and to identify limitations and opportunities for further artificial intelligence (AI) integration.

METHODS

We performed a retrospective prediction study using 693,587 death certificates issued in Korea between January 2021 and December 2022. Free-text fields for immediate, antecedent, and contributory causes were concatenated and fine-tuned with KM-BERT. Three classification models were developed: (1) final underlying cause prediction (International Classification of Diseases, 10th Revision [ICD-10] code) from certificate inputs, (2) tentative underlying cause selection based on ICD-10 Volume 2 rules, and (3) classification of individual cause-of-death entries. Models were trained and validated using 2021 data (80% training, 20% validation) and evaluated on 2022 data. Performance metrics included overall accuracy, weighted F1 score, and macro F1 score.

RESULTS

On 306,898 certificates from 2022, the final cause model achieved 62.65% accuracy (F1-weighted, 0.5940; F1-macro, 0.1503). The tentative cause model demonstrated 95.35% accuracy (F1-weighted, 0.9516; F1-macro, 0.4996). The individual entry model yielded 79.51% accuracy (F1-weighted, 0.7741; F1-macro, 0.9250). Error analysis indicated reduced reliability for rare diseases and for specific ICD chapters, which require supplementary administrative data.

CONCLUSION

Despite strong performance in mapping free-text inputs and selecting tentative underlying causes, there remains a need for improved data quality, administrative record integration, and model refinement. A systematic, long-term approach is essential for the broad adoption of AI in mortality statistics.

摘要

目的

本研究评估了利用韩国医学BERT(KM-BERT)模型进行深度学习方法在国家死亡率统计中自动分类潜在死因的可行性和性能。其旨在评估整个死因编码工作流程中的预测准确性,并识别进一步整合人工智能(AI)的局限性和机会。

方法

我们使用2021年1月至2022年12月期间在韩国发放的693,587份死亡证明进行了一项回顾性预测研究。将直接死因、先行死因和辅助死因的自由文本字段连接起来,并用KM-BERT进行微调。开发了三种分类模型:(1)根据证明输入预测最终潜在死因(国际疾病分类第10版[ICD-10]编码),(2)根据ICD-10第2卷规则选择暂定潜在死因,以及(3)对各个死因条目进行分类。使用2021年的数据(80%用于训练,20%用于验证)对模型进行训练和验证,并在2022年的数据上进行评估。性能指标包括总体准确率、加权F1分数和宏F1分数。

结果

对于2022年的306,898份证明,最终死因模型的准确率为62.65%(F1加权,0.5940;F1宏,0.1503)。暂定死因模型的准确率为95.35%(F1加权,0.9516;F1宏,0.4996)。单个条目模型的准确率为79.51%(F1加权,0.7741;F1宏,0.9250)。错误分析表明,罕见疾病和特定ICD章节的可靠性较低,这需要补充行政数据。

结论

尽管在映射自由文本输入和选择暂定潜在死因方面表现出色,但仍需要提高数据质量、整合行政记录和改进模型。系统的长期方法对于在死亡率统计中广泛采用AI至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5838/12362283/00ba89a5036e/emj-2025-00675f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5838/12362283/6c8d1326d464/emj-2025-00675f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5838/12362283/8b808a5b95f2/emj-2025-00675f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5838/12362283/00ba89a5036e/emj-2025-00675f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5838/12362283/6c8d1326d464/emj-2025-00675f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5838/12362283/8b808a5b95f2/emj-2025-00675f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5838/12362283/00ba89a5036e/emj-2025-00675f3.jpg

相似文献

1
Machine learning for automated cause-of-death classification from 2021 to 2022 in Korea: development and validation of an ICD-10 prediction model.韩国2021年至2022年用于自动死因分类的机器学习:ICD - 10预测模型的开发与验证
Ewha Med J. 2025 Jul;48(3):e45. doi: 10.12771/emj.2025.00675. Epub 2025 Jul 28.
2
Developing an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description-Based Code Selection.开发国际疾病分类第十版(ICD - 10)编码助手:使用RoBERTa和GPT - 4进行术语提取和基于描述的代码选择的试点研究
JMIR Form Res. 2025 Feb 11;9:e60095. doi: 10.2196/60095.
3
Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究
Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.
4
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
5
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
6
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
7
Deep Learning and Image Generator Health Tabular Data (IGHT) for Predicting Overall Survival in Patients With Colorectal Cancer: Retrospective Study.深度学习与图像生成器健康表格数据(IGHT)用于预测结直肠癌患者的总生存期:回顾性研究
JMIR Med Inform. 2025 Aug 19;13:e75022. doi: 10.2196/75022.
8
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
9
A Pilot Study of Breast Cancer Histopathological Image Classification Using Google Teachable Machine: A No-Code Artificial Intelligence Approach.使用谷歌可教机器进行乳腺癌组织病理学图像分类的初步研究:一种无代码人工智能方法。
Cureus. 2025 Jul 4;17(7):e87301. doi: 10.7759/cureus.87301. eCollection 2025 Jul.
10
Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.关于使用人工智能评估临床数据完整性并生成元数据的提案:算法开发与验证
JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.

引用本文的文献

1
Ewha Medical Journal's inclusion in PubMed Central and PubMed, and artificial intelligence and guidelines in this issue.《梨花医学杂志》被收录于PubMed Central和PubMed,以及本期的人工智能与指南。
Ewha Med J. 2025 Jul;48(3):e38. doi: 10.12771/emj.2025.00710. Epub 2025 Jul 31.

本文引用的文献

1
A pre-trained BERT for Korean medical natural language processing.用于韩语医学自然语言处理的预训练 BERT。
Sci Rep. 2022 Aug 16;12(1):13847. doi: 10.1038/s41598-022-17806-8.