• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从自然语言中对ICD - 10医学实体进行神经翻译和自动识别:模型开发与性能评估

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment.

作者信息

Falissard Louis, Morgand Claire, Ghosn Walid, Imbaud Claire, Bounebache Karim, Rey Grégoire

机构信息

Centre for Epidemiology on Medical Causes of Death, Inserm, Le Kremlin Bicêtre, France.

出版信息

JMIR Med Inform. 2022 Apr 11;10(4):e26353. doi: 10.2196/26353.

DOI:10.2196/26353
PMID:35404262
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9039820/
Abstract

BACKGROUND

The recognition of medical entities from natural language is a ubiquitous problem in the medical field, with applications ranging from medical coding to the analysis of electronic health data for public health. It is, however, a complex task usually requiring human expert intervention, thus making it expansive and time-consuming. Recent advances in artificial intelligence, specifically the rise of deep learning methods, have enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. However, they require a considerable amount of data to learn from, which is typically their main limiting factor. The Centre for Epidemiology on Medical Causes of Death (CépiDc) stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human-coded medical entities available to the machine learning practitioner.

OBJECTIVE

The aim of this paper was to investigate the application of deep neural sequence models to the problem of medical entity recognition from natural language.

METHODS

The investigated data set included every French death certificate from 2011 to 2016. These certificates contain information such as the subject's age, the subject's gender, and the chain of events leading to his or her death, both in French and encoded as International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) medical entities, for a total of around 3 million observations in the data set. The task of automatically recognizing ICD-10 medical entities from the French natural language-based chain of events leading to death was then formulated as a type of predictive modeling problem known as a sequence-to-sequence modeling problem. A deep neural network-based model, known as the Transformer, was then slightly adapted and fit to the data set. Its performance was then assessed on an external data set and compared to the current state-of-the-art approach. CIs for derived measurements were estimated via bootstrapping.

RESULTS

The proposed approach resulted in an F-measure value of 0.952 (95% CI 0.946-0.957), which constitutes a significant improvement over the current state-of-the-art approach and its previously reported F-measure value of 0.825 as assessed on a comparable data set. Such an improvement makes possible a whole field of new applications, from nosologist-level automated coding to temporal harmonization of death statistics.

CONCLUSIONS

This paper shows that a deep artificial neural network can directly learn from voluminous data sets in order to identify complex relationships between natural language and medical entities, without any explicit prior knowledge. Although not entirely free from mistakes, the derived model constitutes a powerful tool for automated coding of medical entities from medical language with promising potential applications.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2938/9039820/d8169c384b91/medinform_v10i4e26353_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2938/9039820/b053423b220e/medinform_v10i4e26353_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2938/9039820/d8169c384b91/medinform_v10i4e26353_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2938/9039820/b053423b220e/medinform_v10i4e26353_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2938/9039820/d8169c384b91/medinform_v10i4e26353_fig2.jpg
摘要

背景

从自然语言中识别医学实体是医学领域普遍存在的问题,其应用范围涵盖从医学编码到用于公共卫生的电子健康数据分析等。然而,这是一项复杂的任务,通常需要人类专家干预,因此成本高昂且耗时。人工智能的最新进展,特别是深度学习方法的兴起,使计算机能够在许多复杂问题上做出高效决策,神经序列模型及其在自然语言处理中的强大应用就是显著例子。然而,它们需要大量数据来学习,这通常是其主要限制因素。法国国家死因流行病学中心(CépiDc)存储了法国全国范围内详尽的死亡证明数据库,为机器学习从业者提供了数百万个自然语言示例及其相关的人工编码医学实体。

目的

本文旨在研究深度神经序列模型在从自然语言中识别医学实体问题上的应用。

方法

所研究的数据集包括2011年至2016年的每一份法国死亡证明。这些证明包含诸如死者年龄、性别以及导致其死亡的事件链等信息,既有法语形式,也编码为《国际疾病和相关健康问题统计分类》第十次修订版(ICD - 10)医学实体,数据集中总共有约300万个观测值。从基于法语自然语言的导致死亡的事件链中自动识别ICD - 10医学实体的任务随后被表述为一种称为序列到序列建模问题的预测建模问题。然后对一种基于深度神经网络的模型(称为Transformer)进行了轻微调整并使其适应该数据集。随后在一个外部数据集上评估其性能,并与当前的最先进方法进行比较。通过自举法估计派生测量值的置信区间。

结果

所提出的方法得出的F值为0.952(95%置信区间0.946 - 0.957),与当前最先进方法相比有显著改进,在可比数据集上评估时,其先前报告的F值为0.825。这样的改进使一系列新应用成为可能,从疾病分类学家级别的自动编码到死亡统计的时间协调。

结论

本文表明深度人工神经网络可以直接从大量数据集中学习,以识别自然语言和医学实体之间的复杂关系,而无需任何明确的先验知识。尽管并非完全没有错误,但派生模型构成了从医学语言自动编码医学实体的强大工具,具有很有前景的潜在应用。

相似文献

1
Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment.从自然语言中对ICD - 10医学实体进行神经翻译和自动识别:模型开发与性能评估
JMIR Med Inform. 2022 Apr 11;10(4):e26353. doi: 10.2196/26353.
2
A Deep Artificial Neural Network-Based Model for Prediction of Underlying Cause of Death From Death Certificates: Algorithm Development and Validation.一种基于深度人工神经网络的模型,用于根据死亡证明预测潜在死因:算法开发与验证
JMIR Med Inform. 2020 Apr 28;8(4):e17125. doi: 10.2196/17125.
3
Evaluating a Natural Language Processing-Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study.在真实医院环境中评估自然语言处理驱动、人工智能辅助的国际疾病分类第 10 版临床修订版、诊断相关组编码系统:算法开发和验证研究。
J Med Internet Res. 2024 Sep 20;26:e58278. doi: 10.2196/58278.
4
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
5
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.
6
Automatic construction of rule-based ICD-9-CM coding systems.基于规则的ICD-9-CM编码系统的自动构建。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-9-S3-S10.
7
Interpretable deep learning to map diagnostic texts to ICD-10 codes.可解释的深度学习将诊断文本映射到 ICD-10 代码。
Int J Med Inform. 2019 Sep;129:49-59. doi: 10.1016/j.ijmedinf.2019.05.015. Epub 2019 May 22.
8
Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016.2016年CLEF电子健康评估实验室的临床信息提取
CEUR Workshop Proc. 2016 Sep;1609:28-42.
9
Combining deep neural networks, a rule-based expert system and targeted manual coding for ICD-10 coding causes of death of French death certificates from 2018 to 2019.将深度学习神经网络、基于规则的专家系统和针对目标的手动编码相结合,对 2018 年至 2019 年法国死亡证明的 ICD-10 死亡原因进行编码。
Int J Med Inform. 2024 Aug;188:105462. doi: 10.1016/j.ijmedinf.2024.105462. Epub 2024 Apr 26.
10
Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning.自动ICD - 10编码与训练系统:基于监督学习的深度神经网络
JMIR Med Inform. 2021 Aug 31;9(8):e23230. doi: 10.2196/23230.

引用本文的文献

1
Person-centered care at population scale: The Swedish registry for behavioral and psychological symptoms of dementia.大规模人群中的以患者为中心的护理:瑞典痴呆行为和心理症状登记处
Alzheimers Dement (N Y). 2025 Feb 24;11(1):e70057. doi: 10.1002/trc2.70057. eCollection 2025 Jan-Mar.
2
Real-Time Classification of Causes of Death Using AI: Sensitivity Analysis.使用人工智能对死因进行实时分类:敏感性分析
JMIR AI. 2023 Nov 22;2:e40965. doi: 10.2196/40965.
3
Year 2022 in Medical Natural Language Processing: Availability of Language Models as a Step in the Democratization of NLP in the Biomedical Area.

本文引用的文献

1
Interpretable deep learning to map diagnostic texts to ICD-10 codes.可解释的深度学习将诊断文本映射到 ICD-10 代码。
Int J Med Inform. 2019 Sep;129:49-59. doi: 10.1016/j.ijmedinf.2019.05.015. Epub 2019 May 22.
2
Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016.2016年CLEF电子健康评估实验室的临床信息提取
CEUR Workshop Proc. 2016 Sep;1609:28-42.
2022 年医学自然语言处理:语言模型的可用性是生物医学领域 NLP 民主化的一步。
Yearb Med Inform. 2023 Aug;32(1):244-252. doi: 10.1055/s-0043-1768752. Epub 2023 Dec 26.