• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用神经网络自动多标签检测荷兰心脏病学出院小结中的ICD10编码

Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks.

作者信息

Sammani Arjan, Bagheri Ayoub, van der Heijden Peter G M, Te Riele Anneline S J M, Baas Annette F, Oosters C A J, Oberski Daniel, Asselbergs Folkert W

机构信息

Department of Cardiology, Division of Heart & Lungs, University Medical Centre Utrecht, University of Utrecht, Utrecht, The Netherlands.

Department of Methodology and Statistics, Faculty of Social Sciences, Utrecht University, Utrecht, The Netherlands.

出版信息

NPJ Digit Med. 2021 Feb 26;4(1):37. doi: 10.1038/s41746-021-00404-9.

DOI:10.1038/s41746-021-00404-9
PMID:33637859
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7910461/
Abstract

Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76-0.99 for three-character and 0.87-0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.

摘要

诊断和风险因素的标准参考术语对于计费、流行病学研究以及疾病的国际/国内比较至关重要。国际疾病分类(ICD)是一种标准化且广泛使用的方法,但手动分类是一项极其耗时的工作。自然语言处理与机器学习相结合,可以使用ICD - 10编码对诊断进行自动结构化处理,然而机器学习模型的性能有限、需要庞大的数据集以及这些编码末尾部分的可靠性较差,限制了其临床实用性。我们旨在创建一个高性能的流程,用于对心脏病学免费医学文本中的可靠ICD - 10编码进行自动分类。我们专注于常用且定义明确的三位和四位ICD - 10编码,这些编码仍具有足够的粒度以具有临床相关性,例如心房颤动(I48)、急性心肌梗死(I21)或扩张型心肌病(I42.0)。我们的流程使用了一种称为双向门控循环单元神经网络的深度神经网络,并使用5548份出院小结进行训练和测试,并在5089份出院小结和手术记录中进行了验证。由于在临床实践中出院小结可能会被标记多个编码,我们评估了主要诊断和心血管风险因素的单标签和多标签性能。我们研究了使用整个文本主体以及仅使用总结段落,并辅以年龄和性别信息。考虑到出院小结中包含隐私敏感信息,我们添加了去识别步骤。性能很高,三位ICD - 10编码的F1分数为0.76 - 0.99,四位编码为0.87 - 0.98,使用完整出院小结时效果最佳。添加年龄/性别变量不影响结果。为了实现模型可解释性,提供了词系数并手动进行了分类的定性评估。由于其高性能,该流程有助于减轻出院诊断分类的管理负担,并可作为报销和研究应用的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/9fe7991cbf25/41746_2021_404_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/148054f83d97/41746_2021_404_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/42fcd712806f/41746_2021_404_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/c022e5710000/41746_2021_404_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/2ee029475b97/41746_2021_404_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/0fcfa0c0a13a/41746_2021_404_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/9fe7991cbf25/41746_2021_404_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/148054f83d97/41746_2021_404_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/42fcd712806f/41746_2021_404_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/c022e5710000/41746_2021_404_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/2ee029475b97/41746_2021_404_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/0fcfa0c0a13a/41746_2021_404_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacf/7910461/9fe7991cbf25/41746_2021_404_Fig6_HTML.jpg

相似文献

1
Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks.使用神经网络自动多标签检测荷兰心脏病学出院小结中的ICD10编码
NPJ Digit Med. 2021 Feb 26;4(1):37. doi: 10.1038/s41746-021-00404-9.
2
Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches.自动国际疾病分类编码系统:基于规则方法的深度情境化语言模型
JMIR Med Inform. 2022 Jun 29;10(6):e37557. doi: 10.2196/37557.
3
Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning.自动ICD - 10编码与训练系统:基于监督学习的深度神经网络
JMIR Med Inform. 2021 Aug 31;9(8):e23230. doi: 10.2196/23230.
4
Estimating a Bias in ICD Encodings for Billing Purposes.估算用于计费目的的国际疾病分类编码中的偏差
Stud Health Technol Inform. 2018;247:141-145.
5
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
6
Automatic ICD-10 coding: Deep semantic matching based on analogical reasoning.自动ICD-10编码:基于类比推理的深度语义匹配
Heliyon. 2023 Apr 19;9(4):e15570. doi: 10.1016/j.heliyon.2023.e15570. eCollection 2023 Apr.
7
Automated ICD-10 code assignment of nonstandard diagnoses via a two-stage framework.通过两阶段框架对非标准诊断进行自动ICD-10编码分配
Artif Intell Med. 2020 Aug;108:101939. doi: 10.1016/j.artmed.2020.101939. Epub 2020 Aug 15.
8
Classification of Current Procedural Terminology Codes from Electronic Health Record Data Using Machine Learning.使用机器学习对电子健康记录数据中的当前操作术语代码进行分类。
Anesthesiology. 2020 Apr;132(4):738-749. doi: 10.1097/ALN.0000000000003150.
9
Using natural language processing for automated classification of disease and to identify misclassified ICD codes in cardiac disease.利用自然语言处理技术对疾病进行自动分类,并识别心脏病中错误分类的国际疾病分类(ICD)编码。
Eur Heart J Digit Health. 2024 Feb 9;5(3):229-234. doi: 10.1093/ehjdh/ztae008. eCollection 2024 May.
10
Automated Classification of Selected Data Elements from Free-text Diagnostic Reports for Clinical Research.用于临床研究的自由文本诊断报告中选定数据元素的自动分类
Methods Inf Med. 2016 Aug 5;55(4):373-80. doi: 10.3414/ME15-02-0019. Epub 2016 Jul 13.

引用本文的文献

1
A Comparative Analysis of Machine-Learning Algorithms for Automated International Classification of Diseases (ICD)-10 Coding in Malaysian Death Records.马来西亚死亡记录中用于自动国际疾病分类(ICD)-10编码的机器学习算法的比较分析
Cureus. 2025 Jan 12;17(1):e77342. doi: 10.7759/cureus.77342. eCollection 2025 Jan.
2
Impact of COVID-19 infection among patients hospitalized for conventional pacemaker implantation: Analysis of the Nationwide Inpatient Sample (NIS) 2020.2020年全国住院患者样本(NIS)分析:新冠病毒感染对因常规起搏器植入而住院患者的影响
J Arrhythm. 2024 May 30;40(4):905-912. doi: 10.1002/joa3.13089. eCollection 2024 Aug.
3

本文引用的文献

1
Exploring the Privacy-Preserving Properties of Word Embeddings: Algorithmic Validation Study.探索词嵌入的隐私保护特性:算法验证研究
J Med Internet Res. 2020 Jul 15;22(7):e18055. doi: 10.2196/18055.
2
Automated ICD coding via unsupervised knowledge integration (UNITE).基于无监督知识集成的 ICD 编码自动化(UNITE)。
Int J Med Inform. 2020 Jul;139:104135. doi: 10.1016/j.ijmedinf.2020.104135. Epub 2020 Apr 4.
3
Does BERT need domain adaptation for clinical negation detection?BERT 是否需要进行领域适应来进行临床否定检测?
Systematic evaluation of common natural language processing techniques to codify clinical notes.
系统评估常见的自然语言处理技术以对临床记录进行编码。
PLoS One. 2024 Mar 7;19(3):e0298892. doi: 10.1371/journal.pone.0298892. eCollection 2024.
4
Prediabetes: An overlooked risk factor for major adverse cardiac and cerebrovascular events in atrial fibrillation patients.糖尿病前期:心房颤动患者发生重大不良心脑血管事件的一个被忽视的危险因素。
World J Diabetes. 2024 Jan 15;15(1):24-33. doi: 10.4239/wjd.v15.i1.24.
5
Artificial intelligence: revolutionizing cardiology with large language models.人工智能:大语言模型颠覆心脏病学。
Eur Heart J. 2024 Feb 1;45(5):332-345. doi: 10.1093/eurheartj/ehad838.
6
Social Risk Factors are Associated with Risk for Hospitalization in Home Health Care: A Natural Language Processing Study.社会风险因素与家庭医疗保健住院风险相关:一项自然语言处理研究。
J Am Med Dir Assoc. 2023 Dec;24(12):1874-1880.e4. doi: 10.1016/j.jamda.2023.06.031. Epub 2023 Aug 5.
7
Artificial intelligence in cardiology: the debate continues.心脏病学中的人工智能:争论仍在继续。
Eur Heart J Digit Health. 2021 Oct 18;2(4):721-726. doi: 10.1093/ehjdh/ztab090. eCollection 2021 Dec.
8
Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study.通过联邦学习训练用于国际疾病分类第10次修订版分类的深度情境化语言模型:模型开发与验证研究
JMIR Med Inform. 2022 Nov 10;10(11):e41342. doi: 10.2196/41342.
9
Automatic Identification of Patients With Unexplained Left Ventricular Hypertrophy in Electronic Health Record Data to Improve Targeted Treatment and Family Screening.在电子健康记录数据中自动识别不明原因左心室肥厚患者以改善靶向治疗和家庭筛查
Front Cardiovasc Med. 2022 Apr 15;9:768847. doi: 10.3389/fcvm.2022.768847. eCollection 2022.
10
Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching.临床记录中药物不良反应识别流程的开发:词嵌入模型与字符串匹配
JMIR Med Inform. 2022 Jan 25;10(1):e31063. doi: 10.2196/31063.
J Am Med Inform Assoc. 2020 Apr 1;27(4):584-591. doi: 10.1093/jamia/ocaa001.
4
Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record.基于规则和机器学习算法可在电子健康记录中准确识别系统性硬化症患者。
Arthritis Res Ther. 2019 Dec 30;21(1):305. doi: 10.1186/s13075-019-2092-7.
5
Boosting ICD multi-label classification of health records with contextual embeddings and label-granularity.利用上下文嵌入和标签粒度增强 ICD 多标签健康记录分类。
Comput Methods Programs Biomed. 2020 May;188:105264. doi: 10.1016/j.cmpb.2019.105264. Epub 2019 Dec 10.
6
Interpretable deep learning to map diagnostic texts to ICD-10 codes.可解释的深度学习将诊断文本映射到 ICD-10 代码。
Int J Med Inform. 2019 Sep;129:49-59. doi: 10.1016/j.ijmedinf.2019.05.015. Epub 2019 May 22.
7
Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study.用于对ICD-10-CM编码进行分类的混合采样训练投影词嵌入模型:纵向观察研究
JMIR Med Inform. 2019 Jul 23;7(3):e14499. doi: 10.2196/14499.
8
VetTag: improving automated veterinary diagnosis coding via large-scale language modeling.VetTag:通过大规模语言模型改进自动兽医诊断编码
NPJ Digit Med. 2019 May 8;2:35. doi: 10.1038/s41746-019-0113-1. eCollection 2019.
9
Automatic ICD Code Assignment based on ICD's Hierarchy Structure for Chinese Electronic Medical Records.基于ICD层次结构的中文电子病历自动ICD编码分配
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:417-424. eCollection 2019.
10
ML-Net: multi-label classification of biomedical texts with deep neural networks.ML-Net:基于深度神经网络的生物医学文本多标签分类
J Am Med Inform Assoc. 2019 Nov 1;26(11):1279-1285. doi: 10.1093/jamia/ocz085.