MT-clinical BERT：基于多任务学习的临床信息提取扩展。

MT-clinical BERT: scaling clinical information extraction with multitask learning.

机构信息

Computer Science Department, Virginia Commonwealth University, Richmond, Virginia, USA.

Information Sciences and Technology, George Mason University, Fairfax, Virginia, USA.

出版信息

J Am Med Inform Assoc. 2021 Sep 18;28(10):2108-2115. doi: 10.1093/jamia/ocab126.

DOI:10.1093/jamia/ocab126

PMID:34333635

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8449623/

Abstract

OBJECTIVE

Clinical notes contain an abundance of important, but not-readily accessible, information about patients. Systems that automatically extract this information rely on large amounts of training data of which there exists limited resources to create. Furthermore, they are developed disjointly, meaning that no information can be shared among task-specific systems. This bottleneck unnecessarily complicates practical application, reduces the performance capabilities of each individual solution, and associates the engineering debt of managing multiple information extraction systems.

MATERIALS AND METHODS

We address these challenges by developing Multitask-Clinical BERT: a single deep learning model that simultaneously performs 8 clinical tasks spanning entity extraction, personal health information identification, language entailment, and similarity by sharing representations among tasks.

RESULTS

We compare the performance of our multitasking information extraction system to state-of-the-art BERT sequential fine-tuning baselines. We observe a slight but consistent performance degradation in MT-Clinical BERT relative to sequential fine-tuning.

DISCUSSION

These results intuitively suggest that learning a general clinical text representation capable of supporting multiple tasks has the downside of losing the ability to exploit dataset or clinical note-specific properties when compared to a single, task-specific model.

CONCLUSIONS

We find our single system performs competitively with all state-the-art task-specific systems while also benefiting from massive computational benefits at inference.

摘要

目的

临床记录中包含大量关于患者的重要但不易获取的信息。自动提取这些信息的系统依赖于大量的训练数据，但创建这些数据的资源有限。此外，它们是独立开发的，这意味着特定于任务的系统之间无法共享信息。这种瓶颈不必要地增加了实际应用的复杂性，降低了每个单独解决方案的性能能力，并带来了管理多个信息提取系统的工程债务。

材料和方法

我们通过开发 Multitask-Clinical BERT 来解决这些挑战：这是一个单一的深度学习模型，通过在任务之间共享表示，同时执行 8 个跨越实体提取、个人健康信息识别、语言蕴涵和相似性的临床任务。

结果

我们将我们的多任务信息提取系统的性能与最先进的 BERT 顺序微调基准进行了比较。我们观察到 MT-Clinical BERT 的性能相对于顺序微调略有但一致的下降。

讨论

这些结果直观地表明，与单个特定于任务的模型相比，学习能够支持多个任务的通用临床文本表示具有失去利用数据集或临床记录特定属性的能力的缺点。

结论

我们发现我们的单个系统与所有最先进的特定于任务的系统竞争，同时在推理时也受益于大规模的计算优势。

相似文献

MT-clinical BERT: scaling clinical information extraction with multitask learning.MT-clinical BERT：基于多任务学习的临床信息提取扩展。

J Am Med Inform Assoc. 2021 Sep 18;28(10):2108-2115. doi: 10.1093/jamia/ocab126.

The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道：概述

JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用：算法开发与验证。

J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。

J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.

Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition.基于神经科学和类脑认知的实体BERT模型在电子病历实体识别中的应用

Front Neurosci. 2023 Sep 20;17:1259652. doi: 10.3389/fnins.2023.1259652. eCollection 2023.

Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning.临床笔记中语义相似句子的识别：使用多任务学习的迭代中间训练

JMIR Med Inform. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508.

Drug knowledge discovery via multi-task learning and pre-trained models.通过多任务学习和预训练模型进行药物知识发现。

BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT.基于 BERT 的日本临床领域文本的语义文本相似性研究

Methods Inf Med. 2021 Jun;60(S 01):e56-e64. doi: 10.1055/s-0041-1731390. Epub 2021 Jul 8.

引用本文的文献

A novel dual embedding few-shot learning approach for classifying bone loss using orthopantomogram radiographic notes.一种用于使用全景X线片影像学记录对骨质流失进行分类的新型双嵌入少样本学习方法。

Head Face Med. 2025 Jul 11;21(1):49. doi: 10.1186/s13005-025-00528-3.

Artificial intelligence technology in ophthalmology public health: current applications and future directions.眼科公共卫生中的人工智能技术：当前应用与未来方向。

Front Cell Dev Biol. 2025 Apr 17;13:1576465. doi: 10.3389/fcell.2025.1576465. eCollection 2025.

RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.RAMIE：基于大语言模型的膳食补充剂检索增强多任务信息提取

J Am Med Inform Assoc. 2025 Mar 1;32(3):545-554. doi: 10.1093/jamia/ocaf002.

MISDP: multi-task fusion visit interval for sequential diagnosis prediction.MISDP：用于序列诊断预测的多任务融合就诊间隔

BMC Bioinformatics. 2024 Dec 20;25(1):387. doi: 10.1186/s12859-024-05998-x.

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook.医疗保健中的多模态大型语言模型：应用、挑战和未来展望。

J Med Internet Res. 2024 Sep 25;26:e59505. doi: 10.2196/59505.

A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation.来自国家新冠病毒队列协作组的开放健康自然语言处理工具包案例演示以及为新冠病毒感染或新冠后综合征增强恢复计划而开展的新冠病毒自然语言处理系统研究：算法开发与验证

JMIR Med Inform. 2024 Sep 9;12:e49997. doi: 10.2196/49997.

Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes.症状-BERT：增强电子健康记录临床记录中的癌症症状检测

J Pain Symptom Manage. 2024 Aug;68(2):190-198.e1. doi: 10.1016/j.jpainsymman.2024.05.015. Epub 2024 May 23.

Predicting relations between SOAP note sections: The value of incorporating a clinical information model.预测 SOAP 笔记各部分之间的关系：纳入临床信息模型的价值。

J Biomed Inform. 2023 May;141:104360. doi: 10.1016/j.jbi.2023.104360. Epub 2023 Apr 14.

Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing.2021 年：COVID-19、医学自然语言处理中的信息抽取和 BERT 化成为热门话题。

Yearb Med Inform. 2022 Aug;31(1):254-260. doi: 10.1055/s-0042-1742547. Epub 2022 Dec 4.

Classifying unstructured electronic consult messages to understand primary care physician specialty information needs.对非结构化电子咨询信息进行分类，以了解初级保健医生的专业信息需求。

J Am Med Inform Assoc. 2022 Aug 16;29(9):1607-1617. doi: 10.1093/jamia/ocac092.

本文引用的文献

Family history information extraction via deep joint learning.通过深度联合学习提取家族史信息。

BMC Med Inform Decis Mak. 2019 Dec 27;19(Suppl 10):277. doi: 10.1186/s12911-019-0995-5.

2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records.2018n2c2 电子健康记录中药物不良反应和药物提取共享任务。

J Am Med Inform Assoc. 2020 Jan 1;27(1):3-12. doi: 10.1093/jamia/ocz166.

Cross-type biomedical named entity recognition with deep multi-task learning.基于深度多任务学习的跨类型生物医学命名实体识别。

Bioinformatics. 2019 May 15;35(10):1745-1752. doi: 10.1093/bioinformatics/bty869.

A neural network multi-task learning approach to biomedical named entity recognition.一种用于生物医学命名实体识别的神经网络多任务学习方法。

BMC Bioinformatics. 2017 Aug 15;18(1):368. doi: 10.1186/s12859-017-1776-8.

A neural joint model for entity and relation extraction from biomedical text.一种用于从生物医学文本中提取实体和关系的神经联合模型。

BMC Bioinformatics. 2017 Mar 31;18(1):198. doi: 10.1186/s12859-017-1609-9.

Recognizing Question Entailment for Medical Question Answering.识别医学问答中的问题蕴含关系。

AMIA Annu Symp Proc. 2017 Feb 10;2016:310-318. eCollection 2016.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.用于纵向临床记录去识别化的自动化系统：2014年i2b2/德克萨斯大学健康科学中心共享任务赛道1概述

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.

Evaluating temporal relations in clinical text: 2012 i2b2 Challenge.评估临床文本中的时间关系：2012 i2b2 挑战赛。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):806-13. doi: 10.1136/amiajnl-2013-001628. Epub 2013 Apr 5.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛：临床文本中的概念、断言和关系

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验