• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于临床文档分类的大语言模型与人类对比

Large language models vs human for classifying clinical documents.

作者信息

Mustafa Akram, Naseem Usman, Rahimi Azghadi Mostafa

机构信息

College of Science and Engineering, James Cook University, Townsville, 4811, QLD, Australia.

School of Computing, Macquarie University, Sydney, 2113, NSW, Australia.

出版信息

Int J Med Inform. 2025 Mar;195:105800. doi: 10.1016/j.ijmedinf.2025.105800. Epub 2025 Jan 21.

DOI:10.1016/j.ijmedinf.2025.105800
PMID:39848078
Abstract

BACKGROUND

Accurate classification of medical records is crucial for clinical documentation, particularly when using the 10th revision of the International Classification of Diseases (ICD-10) coding system. The use of machine learning algorithms and Systematized Nomenclature of Medicine (SNOMED) mapping has shown promise in performing these classifications. However, challenges remain, particularly in reducing false negatives, where certain diagnoses are not correctly identified by either approach.

OBJECTIVE

This study explores the potential of leveraging advanced large language models to improve the accuracy of ICD-10 classifications in challenging cases of medical records where machine learning and SNOMED mapping fail.

METHODS

We evaluated the performance of ChatGPT 3.5 and ChatGPT 4 in classifying ICD-10 codes from discharge summaries within selected records of the Medical Information Mart for Intensive Care (MIMIC) IV dataset. These records comprised 802 discharge summaries identified as false negatives by both machine learning and SNOMED mapping methods, showing their challenging case. Each summary was assessed by ChatGPT 3.5 and 4 using a classification prompt, and the results were compared to human coder evaluations. Five human coders, with a combined experience of over 30 years, independently classified a stratified sample of 100 summaries to validate ChatGPT's performance.

RESULTS

ChatGPT 4 demonstrated significantly improved consistency over ChatGPT 3.5, with matching results between runs ranging from 86% to 89%, compared to 57% to 67% for ChatGPT 3.5. The classification accuracy of ChatGPT 4 was variable across different ICD-10 codes. Overall, human coders performed better than ChatGPT. However, ChatGPT matched the median performance of human coders, achieving an accuracy rate of 22%.

CONCLUSION

This study underscores the potential of integrating advanced language models with clinical coding processes to improve documentation accuracy. ChatGPT 4 demonstrated improved consistency and comparable performance to median human coders, achieving 22% accuracy in challenging cases. Combining ChatGPT with methods like SNOMED mapping could further enhance clinical coding accuracy, particularly for complex scenarios.

摘要

背景

准确分类医疗记录对于临床文档至关重要,尤其是在使用国际疾病分类第十版(ICD - 10)编码系统时。机器学习算法和医学系统命名法(SNOMED)映射的使用在进行这些分类方面已显示出前景。然而,挑战依然存在,特别是在减少假阴性方面,即某些诊断无法通过这两种方法正确识别。

目的

本研究探讨在机器学习和SNOMED映射失败的具有挑战性的医疗记录案例中,利用先进的大语言模型提高ICD - 10分类准确性的潜力。

方法

我们评估了ChatGPT 3.5和ChatGPT 4在对重症监护医学信息集市(MIMIC)IV数据集选定记录中的出院小结进行ICD - 10编码分类方面的性能。这些记录包括802份被机器学习和SNOMED映射方法均识别为假阴性的出院小结,显示出它们的挑战性。每个小结由ChatGPT 3.5和4使用分类提示进行评估,并将结果与人工编码员的评估进行比较。五名人工编码员,总经验超过30年,独立对100份小结的分层样本进行分类以验证ChatGPT的性能。

结果

ChatGPT 4表现出比ChatGPT 3.5显著更高的一致性,各轮匹配结果在86%至89%之间,而ChatGPT 3.5为57%至67%。ChatGPT 4的分类准确性在不同的ICD - 10编码中有所不同。总体而言,人工编码员的表现优于ChatGPT。然而,ChatGPT达到了人工编码员的中位数表现,准确率为22%。

结论

本研究强调了将先进语言模型与临床编码流程相结合以提高文档准确性的潜力。ChatGPT 4表现出更高的一致性,并且在具有挑战性的案例中性能与人工编码员中位数相当,准确率达到22%。将ChatGPT与SNOMED映射等方法相结合可以进一步提高临床编码准确性,特别是对于复杂场景。

相似文献

1
Large language models vs human for classifying clinical documents.用于临床文档分类的大语言模型与人类对比
Int J Med Inform. 2025 Mar;195:105800. doi: 10.1016/j.ijmedinf.2025.105800. Epub 2025 Jan 21.
2
Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT:一种用于从医学叙述中映射短语概念的机器学习系统。
J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.
3
Computer-Assisted Diagnostic Coding: Effectiveness of an NLP-based approach using SNOMED CT to ICD-10 mappings.计算机辅助诊断编码:一种基于自然语言处理的方法利用SNOMED CT到ICD-10映射的有效性。
AMIA Annu Symp Proc. 2018 Dec 5;2018:807-816. eCollection 2018.
4
Redefining Health Care Data Interoperability: Empirical Exploration of Large Language Models in Information Exchange.重新定义医疗保健数据互操作性:大型语言模型在信息交换中的实证探索。
J Med Internet Res. 2024 May 31;26:e56614. doi: 10.2196/56614.
5
Using SNOMED CT-encoded problems to improve ICD-10-CM coding-A randomized controlled experiment.使用 SNOMED CT 编码问题提高 ICD-10-CM 编码:一项随机对照试验。
Int J Med Inform. 2019 Jun;126:19-25. doi: 10.1016/j.ijmedinf.2019.03.002. Epub 2019 Mar 5.
6
Auto-mapping Clinical Documents to ICD-10 using SNOMED-CT.使用 SNOMED-CT 自动将临床文档映射到 ICD-10。
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:296-304. eCollection 2021.
7
Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.使用预训练语言模型和先进提示学习技术的自主国际疾病分类编码:对一个使用医学文本的自动分析系统的评估
JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.
8
Evaluating a Natural Language Processing-Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study.在真实医院环境中评估自然语言处理驱动、人工智能辅助的国际疾病分类第 10 版临床修订版、诊断相关组编码系统:算法开发和验证研究。
J Med Internet Res. 2024 Sep 20;26:e58278. doi: 10.2196/58278.
9
Mapping the categories of the Swedish primary health care version of ICD-10 to SNOMED CT concepts: rule development and intercoder reliability in a mapping trial.将ICD - 10瑞典初级卫生保健版本的类别映射到SNOMED CT概念:映射试验中的规则制定与编码员间可靠性
BMC Med Inform Decis Mak. 2007 May 2;7:9. doi: 10.1186/1472-6947-7-9.
10
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。
Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

引用本文的文献

1
Can LLMs effectively assist medical coding? Evaluating GPT performance on DRG and targeted clinical tasks.大语言模型能否有效辅助医学编码?评估GPT在疾病诊断相关分组及特定临床任务上的表现。
BMC Med Inform Decis Mak. 2025 Aug 19;25(1):312. doi: 10.1186/s12911-025-03151-z.
2
Enhancing medical coding efficiency through domain-specific fine-tuned large language models.通过特定领域微调的大语言模型提高医学编码效率。
Npj Health Syst. 2025;2(1):14. doi: 10.1038/s44401-025-00018-3. Epub 2025 May 1.