• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

主动迁移学习环境下基于上下文词嵌入的临床概念标注

Clinical concept annotation with contextual word embedding in active transfer learning environment.

作者信息

Abbas Asim, Lee Mark, Shanavas Niloofer, Kovatchev Venelin

机构信息

School of Computer Science, University of Birmingham, Birmingham, UK.

School of Computer Science, University of Birmingham, Abu Dhabi, United Arab Emirates.

出版信息

Digit Health. 2024 Dec 19;10:20552076241308987. doi: 10.1177/20552076241308987. eCollection 2024 Jan-Dec.

DOI:10.1177/20552076241308987
PMID:39711738
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11660282/
Abstract

OBJECTIVE

The study aims to present an active learning approach that automatically extracts clinical concepts from unstructured data and classifies them into explicit categories such as Problem, Treatment, and Test while preserving high precision and recall and demonstrating the approach through experiments using i2b2 public datasets.

METHODS

Initially labeled data are acquired from a lexical-based approach in sufficient amounts to perform an active learning process. A contextual word embedding similarity approach is adopted using BERT base variant models such as ClinicalBERT, DistilBERT, and SCIBERT to automatically classify the unlabeled clinical concept into explicit categories. Additionally, deep learning and large language model (LLM) are trained on acquiring label data through active learning.

RESULTS

Using i2b2 datasets (426 clinical notes), the lexical-based method achieved precision, recall, and F1-scores of 76%, 70%, and 73%. SCIBERT excelled in active transfer learning, yielding precision of 70.84%, recall of 77.40%, F1-score of 73.97%, and accuracy of 69.30%, surpassing counterpart models. Among deep learning models, convolutional neural networks (CNNs) trained with embeddings (BERTBase, DistilBERT, SCIBERT, ClinicalBERT) achieved training accuracies of 92-95% and testing accuracies of 89-93%. These results were higher compared to other deep learning models. Additionally, we individually evaluated these LLMs; among them, ClinicalBERT achieved the highest performance, with a training accuracy of 98.4% and a testing accuracy of 96%, outperforming the others.

CONCLUSIONS

The proposed methodology enhances clinical concept extraction by integrating active learning and models like SCIBERT and CNN. It improves annotation efficiency while maintaining high accuracy, showcasing potential for clinical applications.

摘要

目的

本研究旨在提出一种主动学习方法,该方法能从非结构化数据中自动提取临床概念,并将其分类到明确的类别中,如问题、治疗和检查,同时保持高精度和召回率,并通过使用i2b2公共数据集进行实验来展示该方法。

方法

从基于词汇的方法中获取初始标记数据,数量要足以执行主动学习过程。采用上下文词嵌入相似性方法,使用诸如ClinicalBERT、DistilBERT和SCIBERT等BERT基础变体模型,将未标记的临床概念自动分类到明确的类别中。此外,通过主动学习训练深度学习和大语言模型(LLM)以获取标记数据。

结果

使用i2b2数据集(426份临床记录),基于词汇的方法实现了76%的精确率、70%的召回率和73%的F1分数。SCIBERT在主动迁移学习方面表现出色,精确率为70.84%,召回率为77.40%,F1分数为73.97%,准确率为69.30%,超过了同类模型。在深度学习模型中,使用嵌入(BERTBase、DistilBERT、SCIBERT、ClinicalBERT)训练的卷积神经网络(CNN)实现了92 - 95%的训练准确率和89 - 93%的测试准确率。与其他深度学习模型相比,这些结果更高。此外,我们对这些LLM进行了单独评估;其中,ClinicalBERT表现最佳,训练准确率为98.4%,测试准确率为96%,优于其他模型。

结论

所提出的方法通过整合主动学习以及SCIBERT和CNN等模型来增强临床概念提取。它提高了注释效率,同时保持了高精度,展示了临床应用的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/a3cbddbebdc7/10.1177_20552076241308987-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/1313b02aeb57/10.1177_20552076241308987-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/72b2a1a84bdb/10.1177_20552076241308987-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/b9c6d20d3466/10.1177_20552076241308987-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/7d73ff7721ce/10.1177_20552076241308987-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/033d58c721d6/10.1177_20552076241308987-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/17466493d12f/10.1177_20552076241308987-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/20e6c72fec48/10.1177_20552076241308987-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/783205ebb04a/10.1177_20552076241308987-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/caeee4e8c06c/10.1177_20552076241308987-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/ab58bfdd1c4e/10.1177_20552076241308987-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/e28ab379dace/10.1177_20552076241308987-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/a3cbddbebdc7/10.1177_20552076241308987-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/1313b02aeb57/10.1177_20552076241308987-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/72b2a1a84bdb/10.1177_20552076241308987-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/b9c6d20d3466/10.1177_20552076241308987-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/7d73ff7721ce/10.1177_20552076241308987-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/033d58c721d6/10.1177_20552076241308987-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/17466493d12f/10.1177_20552076241308987-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/20e6c72fec48/10.1177_20552076241308987-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/783205ebb04a/10.1177_20552076241308987-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/caeee4e8c06c/10.1177_20552076241308987-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/ab58bfdd1c4e/10.1177_20552076241308987-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/e28ab379dace/10.1177_20552076241308987-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055e/11660282/a3cbddbebdc7/10.1177_20552076241308987-fig12.jpg

相似文献

1
Clinical concept annotation with contextual word embedding in active transfer learning environment.主动迁移学习环境下基于上下文词嵌入的临床概念标注
Digit Health. 2024 Dec 19;10:20552076241308987. doi: 10.1177/20552076241308987. eCollection 2024 Jan-Dec.
2
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
3
Clinical Concept Extraction with Lexical Semantics to Support Automatic Annotation.基于词汇语义的临床概念提取以支持自动标注。
Int J Environ Res Public Health. 2021 Oct 9;18(20):10564. doi: 10.3390/ijerph182010564.
4
RadioBERT: A deep learning-based system for medical report generation from chest X-ray images using contextual embeddings.RadioBERT:一种基于深度学习的系统,用于使用上下文嵌入从胸部 X 光图像生成医学报告。
J Biomed Inform. 2022 Nov;135:104220. doi: 10.1016/j.jbi.2022.104220. Epub 2022 Oct 10.
5
Enhancing clinical concept extraction with contextual embeddings.利用上下文嵌入增强临床概念提取。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096.
6
An efficient method for disaster tweets classification using gradient-based optimized convolutional neural networks with BERT embeddings.一种使用基于梯度优化的卷积神经网络与BERT嵌入的高效灾难推文分类方法。
MethodsX. 2024 Jul 3;13:102843. doi: 10.1016/j.mex.2024.102843. eCollection 2024 Dec.
7
Adversarial active learning for the identification of medical concepts and annotation inconsistency.对抗式主动学习在医学概念识别和标注不一致性中的应用。
J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.
8
A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.一种用于家族病史信息识别与关系抽取的混合模型:一个端到端信息抽取系统的开发与评估
JMIR Med Inform. 2021 Apr 22;9(4):e22797. doi: 10.2196/22797.
9
Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.
10
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.

本文引用的文献

1
A Taxonomy for Health Information Systems.健康信息系统分类法。
J Med Internet Res. 2024 May 31;26:e47682. doi: 10.2196/47682.
2
Clinical Information Retrieval: A Literature Review.临床信息检索:文献综述
J Healthc Inform Res. 2024 Jan 23;8(2):313-352. doi: 10.1007/s41666-024-00159-4. eCollection 2024 Jun.
3
Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review.临床命名实体识别和关系抽取技术在医学自然语言处理中的应用:系统综述。
Int J Med Inform. 2023 Sep;177:105122. doi: 10.1016/j.ijmedinf.2023.105122. Epub 2023 Jun 5.
4
Clinical Concept Extraction with Lexical Semantics to Support Automatic Annotation.基于词汇语义的临床概念提取以支持自动标注。
Int J Environ Res Public Health. 2021 Oct 9;18(20):10564. doi: 10.3390/ijerph182010564.
5
A practical approach towards causality mining in clinical text using active transfer learning.一种使用主动迁移学习在临床文本中进行因果关系挖掘的实用方法。
J Biomed Inform. 2021 Nov;123:103932. doi: 10.1016/j.jbi.2021.103932. Epub 2021 Oct 8.
6
Clinical concept extraction: A methodology review.临床概念提取:方法学综述。
J Biomed Inform. 2020 Sep;109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.
7
Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据:系统综述
JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.
8
Deep learning in clinical natural language processing: a methodical review.深度学习在临床自然语言处理中的应用:系统综述。
J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200.
9
Enhancing clinical concept extraction with contextual embeddings.利用上下文嵌入增强临床概念提取。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096.
10
Efficient Active Learning for Electronic Medical Record De-identification.用于电子病历去识别化的高效主动学习
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:462-471. eCollection 2019.