• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过联邦学习训练用于国际疾病分类第10次修订版分类的深度情境化语言模型:模型开发与验证研究

Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study.

作者信息

Chen Pei-Fu, He Tai-Liang, Lin Sheng-Che, Chu Yuan-Chia, Kuo Chen-Tsung, Lai Feipei, Wang Ssu-Ming, Zhu Wan-Xuan, Chen Kuan-Chih, Kuo Lu-Cheng, Hung Fang-Ming, Lin Yu-Cheng, Tsai I-Chang, Chiu Chi-Hao, Chang Shu-Chih, Yang Chi-Yu

机构信息

Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.

Department of Anesthesiology, Far Eastern Memorial Hospital, New Taipei City, Taiwan.

出版信息

JMIR Med Inform. 2022 Nov 10;10(11):e41342. doi: 10.2196/41342.

DOI:10.2196/41342
PMID:36355417
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9693720/
Abstract

BACKGROUND

The automatic coding of clinical text documents by using the International Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses and reimbursements. With the development of natural language processing models, new transformer architectures with attention mechanisms have outperformed previous models. Although multicenter training may increase a model's performance and external validity, the privacy of clinical documents should be protected. We used federated learning to train a model with multicenter data, without sharing data per se.

OBJECTIVE

This study aims to train a classification model via federated learning for ICD-10 multilabel classification.

METHODS

Text data from discharge notes in electronic medical records were collected from the following three medical centers: Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital. After comparing the performance of different variants of bidirectional encoder representations from transformers (BERT), PubMedBERT was chosen for the word embeddings. With regard to preprocessing, the nonalphanumeric characters were retained because the model's performance decreased after the removal of these characters. To explain the outputs of our model, we added a label attention mechanism to the model architecture. The model was trained with data from each of the three hospitals separately and via federated learning. The models trained via federated learning and the models trained with local data were compared on a testing set that was composed of data from the three hospitals. The micro F score was used to evaluate model performance across all 3 centers.

RESULTS

The F scores of PubMedBERT, RoBERTa (Robustly Optimized BERT Pretraining Approach), ClinicalBERT, and BioBERT (BERT for Biomedical Text Mining) were 0.735, 0.692, 0.711, and 0.721, respectively. The F score of the model that retained nonalphanumeric characters was 0.8120, whereas the F score after removing these characters was 0.7875-a decrease of 0.0245 (3.11%). The F scores on the testing set were 0.6142, 0.4472, 0.5353, and 0.2522 for the federated learning, Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital models, respectively. The explainable predictions were displayed with highlighted input words via the label attention architecture.

CONCLUSIONS

Federated learning was used to train the ICD-10 classification model on multicenter clinical text while protecting data privacy. The model's performance was better than that of models that were trained locally.

摘要

背景

使用国际疾病分类第10版(ICD - 10)对临床文本文件进行自动编码可用于统计分析和报销。随着自然语言处理模型的发展,具有注意力机制的新型变压器架构已超越先前的模型。尽管多中心训练可能会提高模型的性能和外部有效性,但临床文件的隐私应得到保护。我们使用联邦学习来训练一个基于多中心数据的模型,而不直接共享数据本身。

目的

本研究旨在通过联邦学习训练一个用于ICD - 10多标签分类的模型。

方法

从以下三个医疗中心收集电子病历中出院小结的文本数据:远东纪念医院、台湾大学附属医院和台北荣民总医院。在比较了来自变压器的双向编码器表示(BERT)的不同变体的性能后,选择了PubMedBERT进行词嵌入。关于预处理,保留了非字母数字字符,因为去除这些字符后模型的性能下降。为了解释我们模型的输出,我们在模型架构中添加了标签注意力机制。该模型分别使用来自三家医院的数据以及通过联邦学习进行训练。在由来自三家医院的数据组成的测试集上比较通过联邦学习训练的模型和使用本地数据训练的模型。使用微F分数来评估所有三个中心的模型性能。

结果

PubMedBERT、RoBERTa(稳健优化的BERT预训练方法)、ClinicalBERT和BioBERT(用于生物医学文本挖掘的BERT)的F分数分别为0.735、0.692、0.711和0.721。保留非字母数字字符的模型的F分数为0.8120,而去除这些字符后的F分数为0.7875 - 下降了0.0245(3.11%)。对于联邦学习模型、远东纪念医院模型、台湾大学附属医院模型和台北荣民总医院模型,测试集上的F分数分别为0.6142、0.4472、0.5353和0.2522。通过标签注意力架构以突出显示的输入词展示了可解释的预测。

结论

在保护数据隐私的同时,使用联邦学习在多中心临床文本上训练ICD - 10分类模型。该模型的性能优于在本地训练的模型。

相似文献

1
Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study.通过联邦学习训练用于国际疾病分类第10次修订版分类的深度情境化语言模型:模型开发与验证研究
JMIR Med Inform. 2022 Nov 10;10(11):e41342. doi: 10.2196/41342.
2
Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches.自动国际疾病分类编码系统:基于规则方法的深度情境化语言模型
JMIR Med Inform. 2022 Jun 29;10(6):e37557. doi: 10.2196/37557.
3
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
4
Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning.自动ICD - 10编码与训练系统:基于监督学习的深度神经网络
JMIR Med Inform. 2021 Aug 31;9(8):e23230. doi: 10.2196/23230.
5
Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning.临床笔记中语义相似句子的识别:使用多任务学习的迭代中间训练
JMIR Med Inform. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508.
6
Evaluating a Natural Language Processing-Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study.在真实医院环境中评估自然语言处理驱动、人工智能辅助的国际疾病分类第 10 版临床修订版、诊断相关组编码系统:算法开发和验证研究。
J Med Internet Res. 2024 Sep 20;26:e58278. doi: 10.2196/58278.
7
BERT-based Ranking for Biomedical Entity Normalization.基于BERT的生物医学实体规范化排序
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.
8
Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation.基于字符级令牌的医院信息系统变压器抽取式摘要模型(AlphaBERT)的改进双向编码器表示:开发与性能评估
JMIR Med Inform. 2020 Apr 29;8(4):e17787. doi: 10.2196/17787.
9
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
10
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.

引用本文的文献

1
Large language models in physical therapy: time to adapt and adept.大语言模型在物理治疗中的应用:适应与精通的时机。
Front Public Health. 2024 May 24;12:1364660. doi: 10.3389/fpubh.2024.1364660. eCollection 2024.
2
From explainable to interpretable deep learning for natural language processing in healthcare: How far from reality?从可解释到可理解的深度学习在医疗保健自然语言处理中的应用:离现实还有多远?
Comput Struct Biotechnol J. 2024 May 9;24:362-373. doi: 10.1016/j.csbj.2024.05.004. eCollection 2024 Dec.
3
Road traffic death coding quality in the WHO Mortality Database.

本文引用的文献

1
Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches.自动国际疾病分类编码系统:基于规则方法的深度情境化语言模型
JMIR Med Inform. 2022 Jun 29;10(6):e37557. doi: 10.2196/37557.
2
Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning.自动ICD - 10编码与训练系统:基于监督学习的深度神经网络
JMIR Med Inform. 2021 Aug 31;9(8):e23230. doi: 10.2196/23230.
3
Two-stage Federated Phenotyping and Patient Representation Learning.
道路交通事故死亡编码质量在世界卫生组织死亡率数据库中的体现。
Bull World Health Organ. 2023 Oct 1;101(10):637-648. doi: 10.2471/BLT.23.289683. Epub 2023 Aug 22.
两阶段联合表型分析与患者表征学习
Proc Conf Assoc Comput Linguist Meet. 2019 Aug;2019:283-291. doi: 10.18653/v1/W19-5030.
4
Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks.使用神经网络自动多标签检测荷兰心脏病学出院小结中的ICD10编码
NPJ Digit Med. 2021 Feb 26;4(1):37. doi: 10.1038/s41746-021-00404-9.
5
A narrative review of the impact of the transition to ICD-10 and ICD-10-CM/PCS.关于向国际疾病分类第十版(ICD - 10)及ICD - 10临床修正版/ Procedure Coding System(ICD - 10 - CM/PCS)过渡影响的叙述性综述
JAMIA Open. 2019 Dec 26;3(1):126-131. doi: 10.1093/jamiaopen/ooz066. eCollection 2020 Apr.
6
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
7
Impact of the transition to ICD-10 on Medicare inpatient hospital payments.向国际疾病分类第十版(ICD - 10)过渡对医疗保险住院医院支付的影响。
Medicare Medicaid Res Rev. 2011 Jun 6;1(2):001.02.a02. doi: 10.5600/mmrr.001.02.a02.