• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用自然语言处理技术对疾病进行自动分类,并识别心脏病中错误分类的国际疾病分类(ICD)编码。

Using natural language processing for automated classification of disease and to identify misclassified ICD codes in cardiac disease.

作者信息

Falter Maarten, Godderis Dries, Scherrenberg Martijn, Kizilkilic Sevda Ece, Xu Linqi, Mertens Marc, Jansen Jan, Legroux Pascal, Kindermans Hanne, Sinnaeve Peter, Neven Frank, Dendale Paul

机构信息

Faculty of Medicine and Life Sciences, Hasselt University, Agoralaan gebouw D, 3590 Diepenbeek, Hasselt, Belgium.

Heart Centre Hasselt, Jessa Hospital, Stadsomvaart 11, 3500 Hasselt, Belgium.

出版信息

Eur Heart J Digit Health. 2024 Feb 9;5(3):229-234. doi: 10.1093/ehjdh/ztae008. eCollection 2024 May.

DOI:10.1093/ehjdh/ztae008
PMID:38774372
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11104467/
Abstract

AIMS

ICD codes are used for classification of hospitalizations. The codes are used for administrative, financial, and research purposes. It is known, however, that errors occur. Natural language processing (NLP) offers promising solutions for optimizing the process. To investigate methods for automatic classification of disease in unstructured medical records using NLP and to compare these to conventional ICD coding.

METHODS AND RESULTS

Two datasets were used: the open-source Medical Information Mart for Intensive Care (MIMIC)-III dataset ( = 55.177) and a dataset from a hospital in Belgium ( = 12.706). Automated searches using NLP algorithms were performed for the diagnoses 'atrial fibrillation (AF)' and 'heart failure (HF)'. Four methods were used: rule-based search, logistic regression, term frequency-inverse document frequency (TF-IDF), Extreme Gradient Boosting (XGBoost), and Bio-Bidirectional Encoder Representations from Transformers (BioBERT). All algorithms were developed on the MIMIC-III dataset. The best performing algorithm was then deployed on the Belgian dataset. After preprocessing a total of 1438 reports was retained in the Belgian dataset. XGBoost on TF-IDF matrix resulted in an accuracy of 0.94 and 0.92 for AF and HF, respectively. There were 211 mismatches between algorithm and ICD codes. One hundred and three were due to a difference in data availability or differing definitions. In the remaining 108 mismatches, 70% were due to incorrect labelling by the algorithm and 30% were due to erroneous ICD coding (2% of total hospitalizations).

CONCLUSION

A newly developed NLP algorithm attained a high accuracy for classifying disease in medical records. XGBoost outperformed the deep learning technique BioBERT. NLP algorithms could be used to identify ICD-coding errors and optimize and support the ICD-coding process.

摘要

目的

国际疾病分类(ICD)编码用于住院病例分类。这些编码用于行政、财务和研究目的。然而,已知会出现错误。自然语言处理(NLP)为优化该过程提供了有前景的解决方案。研究使用NLP对非结构化医疗记录中的疾病进行自动分类的方法,并将其与传统的ICD编码进行比较。

方法与结果

使用了两个数据集:开源的重症监护医学信息库(MIMIC)-III数据集(n = 55,177)和比利时一家医院的数据集(n = 12,706)。使用NLP算法对“心房颤动(AF)”和“心力衰竭(HF)”诊断进行自动搜索。使用了四种方法:基于规则的搜索、逻辑回归、词频-逆文档频率(TF-IDF)、极端梯度提升(XGBoost)和生物双向编码器表征从变压器(BioBERT)。所有算法均在MIMIC-III数据集上开发。然后将性能最佳的算法部署到比利时数据集上。预处理后,比利时数据集中共保留了1438份报告。基于TF-IDF矩阵的XGBoost对AF和HF的准确率分别为0.94和0.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e304/11104467/7f8fcee671e2/ztae008_ga.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e304/11104467/7f8fcee671e2/ztae008_ga.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e304/11104467/7f8fcee671e2/ztae008_ga.jpg

相似文献

1
Using natural language processing for automated classification of disease and to identify misclassified ICD codes in cardiac disease.利用自然语言处理技术对疾病进行自动分类,并识别心脏病中错误分类的国际疾病分类(ICD)编码。
Eur Heart J Digit Health. 2024 Feb 9;5(3):229-234. doi: 10.1093/ehjdh/ztae008. eCollection 2024 May.
2
Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches.自动国际疾病分类编码系统:基于规则方法的深度情境化语言模型
JMIR Med Inform. 2022 Jun 29;10(6):e37557. doi: 10.2196/37557.
3
What Kind of Transformer Models to Use for the ICD-10 Codes Classification Task.用于 ICD-10 编码分类任务的哪种类型的转换器模型。
Stud Health Technol Inform. 2024 Aug 22;316:1008-1012. doi: 10.3233/SHTI240580.
4
Ophthalmology Operation Note Encoding with Open-Source Machine Learning and Natural Language Processing.开源机器学习和自然语言处理在眼科手术记录编码中的应用。
Ophthalmic Res. 2023;66(1):928-939. doi: 10.1159/000530954. Epub 2023 May 11.
5
Evaluating a Natural Language Processing-Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study.在真实医院环境中评估自然语言处理驱动、人工智能辅助的国际疾病分类第 10 版临床修订版、诊断相关组编码系统:算法开发和验证研究。
J Med Internet Res. 2024 Sep 20;26:e58278. doi: 10.2196/58278.
6
Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing.利用自然语言处理技术在住院电子病历数据中进行脑血管疾病病例识别。
Brain Inform. 2023 Sep 2;10(1):22. doi: 10.1186/s40708-023-00203-w.
7
Developing an Inpatient Electronic Medical Record Phenotype for Hospital-Acquired Pressure Injuries: Case Study Using Natural Language Processing Models.开发用于医院获得性压力性损伤的住院电子病历表型:使用自然语言处理模型的案例研究
JMIR AI. 2023 Mar 8;2:e41264. doi: 10.2196/41264.
8
Using natural language processing to identify opioid use disorder in electronic health record data.利用自然语言处理技术在电子健康记录数据中识别阿片类药物使用障碍。
Int J Med Inform. 2023 Feb;170:104963. doi: 10.1016/j.ijmedinf.2022.104963. Epub 2022 Dec 10.
9
A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告(CancerBERT 网络)中提取数据的问答系统:开发研究。
J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.
10
Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning.自动ICD - 10编码与训练系统:基于监督学习的深度神经网络
JMIR Med Inform. 2021 Aug 31;9(8):e23230. doi: 10.2196/23230.

引用本文的文献

1
Racial and ethnic disparities in aortic stenosis within a universal healthcare system characterized by natural language processing for targeted intervention.在一个以自然语言处理进行靶向干预为特征的全民医疗体系中,主动脉瓣狭窄的种族和民族差异。
Eur Heart J Digit Health. 2025 Mar 18;6(3):392-403. doi: 10.1093/ehjdh/ztaf018. eCollection 2025 May.
2
Using Natural Language Processing and Machine Learning to classify the status of kidney allograft in Electronic Medical Records written in Spanish.使用自然语言处理和机器学习对西班牙语电子病历中同种异体肾移植的状态进行分类。
PLoS One. 2025 May 8;20(5):e0322587. doi: 10.1371/journal.pone.0322587. eCollection 2025.
3

本文引用的文献

1
Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt.基于提示的自回归生成式多标签少样本ICD编码
Proc AAAI Conf Artif Intell. 2023 Jun 26;37(4):5366-5374. doi: 10.1609/aaai.v37i4.25668.
2
Using Administrative Codes to Measure Health Care Quality.利用行政编码衡量医疗保健质量。
JAMA. 2022 Sep 6;328(9):825-826. doi: 10.1001/jama.2022.12823.
3
Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning.自动ICD - 10编码与训练系统:基于监督学习的深度神经网络
Clinical and research applications of natural language processing for heart failure.
自然语言处理在心力衰竭中的临床与研究应用
Heart Fail Rev. 2025 Mar;30(2):407-415. doi: 10.1007/s10741-024-10472-0. Epub 2024 Dec 19.
JMIR Med Inform. 2021 Aug 31;9(8):e23230. doi: 10.2196/23230.
4
Automated ICD coding for primary diagnosis via clinically interpretable machine learning.通过具有临床解释能力的机器学习实现主要诊断的自动化 ICD 编码。
Int J Med Inform. 2021 Sep;153:104543. doi: 10.1016/j.ijmedinf.2021.104543. Epub 2021 Jul 27.
5
Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio-Canary comorbidity project.自然语言处理在评估心血管疾病合并症中的应用:cardio-Canary 合并症项目。
Clin Cardiol. 2021 Sep;44(9):1296-1304. doi: 10.1002/clc.23687. Epub 2021 Aug 4.
6
Construction of a semi-automatic ICD-10 coding system.构建一个半自动 ICD-10 编码系统。
BMC Med Inform Decis Mak. 2020 Apr 15;20(1):67. doi: 10.1186/s12911-020-1085-4.
7
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
8
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.慢性病临床记录的自然语言处理:系统综述
JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.
9
Diagnostic accuracy of the International Classification of Diseases, Tenth Revision, codes of heart failure in an administrative database.国际疾病分类第十版心力衰竭编码在行政数据库中的诊断准确性。
Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):194-200. doi: 10.1002/pds.4690. Epub 2018 Nov 5.
10
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.