• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

医疗保健语言模型及其在信息提取方面的微调:范围综述。

Health Care Language Models and Their Fine-Tuning for Information Extraction: Scoping Review.

机构信息

ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal.

Select Data, Anaheim, CA, United States.

出版信息

JMIR Med Inform. 2024 Oct 21;12:e60164. doi: 10.2196/60164.

DOI:10.2196/60164
PMID:39432345
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11535799/
Abstract

BACKGROUND

In response to the intricate language, specialized terminology outside everyday life, and the frequent presence of abbreviations and acronyms inherent in health care text data, domain adaptation techniques have emerged as crucial to transformer-based models. This refinement in the knowledge of the language models (LMs) allows for a better understanding of the medical textual data, which results in an improvement in medical downstream tasks, such as information extraction (IE). We have identified a gap in the literature regarding health care LMs. Therefore, this study presents a scoping literature review investigating domain adaptation methods for transformers in health care, differentiating between English and non-English languages, focusing on Portuguese. Most specifically, we investigated the development of health care LMs, with the aim of comparing Portuguese with other more developed languages to guide the path of a non-English-language with fewer resources.

OBJECTIVE

This study aimed to research health care IE models, regardless of language, to understand the efficacy of transformers and what are the medical entities most commonly extracted.

METHODS

This scoping review was conducted using the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) methodology on Scopus and Web of Science Core Collection databases. Only studies that mentioned the creation of health care LMs or health care IE models were included, while large language models (LLMs) were excluded. The latest were not included since we wanted to research LMs and not LLMs, which are architecturally different and have distinct purposes.

RESULTS

Our search query retrieved 137 studies, 60 of which met the inclusion criteria, and none of them were systematic literature reviews. English and Chinese are the languages with the most health care LMs developed. These languages already have disease-specific LMs, while others only have general-health care LMs. European Portuguese does not have any public health care LM and should take examples from other languages to develop, first, general-health care LMs and then, in an advanced phase, disease-specific LMs. Regarding IE models, transformers were the most commonly used method, and named entity recognition was the most popular topic, with only a few studies mentioning Assertion Status or addressing medical lexical problems. The most extracted entities were diagnosis, posology, and symptoms.

CONCLUSIONS

The findings indicate that domain adaptation is beneficial, achieving better results in downstream tasks. Our analysis allowed us to understand that the use of transformers is more developed for the English and Chinese languages. European Portuguese lacks relevant studies and should draw examples from other non-English languages to develop these models and drive progress in AI. Health care professionals could benefit from highlighting medically relevant information and optimizing the reading of the textual data, or this information could be used to create patient medical timelines, allowing for profiling.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad6c/11535799/aec70ecee75d/medinform_v12i1e60164_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad6c/11535799/ed5ece554675/medinform_v12i1e60164_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad6c/11535799/aec70ecee75d/medinform_v12i1e60164_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad6c/11535799/ed5ece554675/medinform_v12i1e60164_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad6c/11535799/aec70ecee75d/medinform_v12i1e60164_fig2.jpg
摘要

背景

针对医学专业学术文献中复杂的语言、日常生活之外的专业术语,以及医疗文本数据中常见的缩写和首字母缩略词,基于转换器的模型需要采用领域适应技术。这种对语言模型(LM)的知识的细化,使得对医学文本数据的理解更好,从而提高医学下游任务的性能,如信息提取(IE)。我们发现文献中存在医疗领域 LM 的空白。因此,本研究进行了范围界定文献综述,调查了医疗领域转换器的领域适应方法,区分了英语和非英语语言,重点是葡萄牙语。具体来说,我们调查了医疗保健 LM 的发展,旨在将葡萄牙语与其他更发达的语言进行比较,为资源较少的非英语语言指明道路。

目的

本研究旨在研究无论语言如何的医疗 IE 模型,以了解转换器的功效以及最常提取的医学实体。

方法

本范围界定综述使用 PRISMA-ScR(系统评价和荟萃分析扩展的首选报告项目用于范围界定综述)方法,在 Scopus 和 Web of Science Core Collection 数据库上进行。仅包括提及创建医疗保健 LM 或医疗保健 IE 模型的研究,而排除大型语言模型(LLM)。未包括最新的研究,因为我们希望研究 LM 而不是 LLM,它们在架构上有所不同,并且具有不同的用途。

结果

我们的搜索查询检索到 137 项研究,其中 60 项符合纳入标准,没有一项是系统文献综述。英语和中文是开发医疗保健 LM 最多的语言。这些语言已经有特定于疾病的 LM,而其他语言只有一般医疗保健 LM。欧洲葡萄牙语没有任何公共医疗保健 LM,应该从其他语言中吸取经验,首先开发一般医疗保健 LM,然后在高级阶段开发特定于疾病的 LM。在 IE 模型方面,转换器是最常用的方法,命名实体识别是最受欢迎的主题,只有少数研究提到断言状态或解决医学词汇问题。提取的最常见实体是诊断、剂量和症状。

结论

研究结果表明,领域适应是有益的,可以在下游任务中取得更好的结果。我们的分析使我们能够了解到,英语和中文对转换器的使用更为成熟。欧洲葡萄牙语缺乏相关研究,应该从其他非英语语言中吸取经验来开发这些模型,推动人工智能的发展。医疗保健专业人员可以从突出医学相关信息和优化文本数据的阅读中受益,或者可以使用这些信息创建患者的医疗时间线,进行患者情况分析。

相似文献

1
Health Care Language Models and Their Fine-Tuning for Information Extraction: Scoping Review.医疗保健语言模型及其在信息提取方面的微调:范围综述。
JMIR Med Inform. 2024 Oct 21;12:e60164. doi: 10.2196/60164.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
4
MediAlbertina: An European Portuguese medical language model.MediAlbertina:一个欧洲葡萄牙语医学语言模型。
Comput Biol Med. 2024 Nov;182:109233. doi: 10.1016/j.compbiomed.2024.109233. Epub 2024 Oct 2.
5
Exploring the Credibility of Large Language Models for Mental Health Support: Protocol for a Scoping Review.探索用于心理健康支持的大语言模型的可信度:一项范围综述方案
JMIR Res Protoc. 2025 Jan 29;14:e62865. doi: 10.2196/62865.
6
Use of SNOMED CT in Large Language Models: Scoping Review.SNOMED CT 在大语言模型中的应用:范围综述。
JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.
7
Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.超越黑木树:影响澳大利亚地区、农村和偏远地区的健康研究问题的快速综述。
Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.
8
Ethics of Procuring and Using Organs or Tissue from Infants and Newborns for Transplantation, Research, or Commercial Purposes: Protocol for a Bioethics Scoping Review.从婴儿和新生儿获取器官或组织用于移植、研究或商业目的的伦理问题:生物伦理学范围审查方案
Wellcome Open Res. 2024 Dec 5;9:717. doi: 10.12688/wellcomeopenres.23235.1. eCollection 2024.
9
Prompt Framework for Extracting Scale-Related Knowledge Entities from Chinese Medical Literature: Development and Evaluation Study.从中医文献中提取量表相关知识实体的提示框架:开发与评估研究
J Med Internet Res. 2025 Mar 18;27:e67033. doi: 10.2196/67033.
10
Large language model-based information extraction from free-text radiology reports: a scoping review protocol.基于大型语言模型的自由文本放射学报告信息提取:范围综述方案。
BMJ Open. 2023 Dec 9;13(12):e076865. doi: 10.1136/bmjopen-2023-076865.

引用本文的文献

1
Leveraging Large Language Models for Accurate Retrieval of Patient Information From Medical Reports: Systematic Evaluation Study.利用大语言模型从医学报告中准确检索患者信息:系统评价研究
JMIR AI. 2025 Jul 3;4:e68776. doi: 10.2196/68776.

本文引用的文献

1
Advancing Italian biomedical information extraction with transformers-based models: Methodological insights and multicenter practical application.基于转换器的模型推进意大利生物医学信息提取:方法学见解和多中心实际应用。
J Biomed Inform. 2023 Dec;148:104557. doi: 10.1016/j.jbi.2023.104557. Epub 2023 Nov 25.
2
Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition.基于神经科学和类脑认知的实体BERT模型在电子病历实体识别中的应用
Front Neurosci. 2023 Sep 20;17:1259652. doi: 10.3389/fnins.2023.1259652. eCollection 2023.
3
A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records.
一项关于电子健康记录中乳腺癌表型自然语言处理算法的跨机构评估。
Comput Struct Biotechnol J. 2023 Aug 22;22:32-40. doi: 10.1016/j.csbj.2023.08.018. eCollection 2023.
4
Transformers for extracting breast cancer information from Spanish clinical narratives.从西班牙语临床叙述中提取乳腺癌信息的转换器。
Artif Intell Med. 2023 Sep;143:102625. doi: 10.1016/j.artmed.2023.102625. Epub 2023 Jul 13.
5
AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease.AD-BERT:利用预训练语言模型预测从轻度认知障碍到阿尔茨海默病的进展。
J Biomed Inform. 2023 Aug;144:104442. doi: 10.1016/j.jbi.2023.104442. Epub 2023 Jul 8.
6
Development of a Corpus Annotated With Mentions of Pain in Mental Health Records: Natural Language Processing Approach.心理健康记录中提及疼痛的语料库开发:自然语言处理方法
JMIR Form Res. 2023 Jun 26;7:e45849. doi: 10.2196/45849.
7
Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.从中国电子病历中自动提取知识并构建类风湿性关节炎知识图谱。
Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.
8
Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records.通过电子健康记录加速检查点抑制剂诱导的结肠炎病例的管理。
JAMIA Open. 2023 Apr 1;6(1):ooad017. doi: 10.1093/jamiaopen/ooad017. eCollection 2023 Apr.
9
Moving toward a standardized diagnostic statement of pituitary adenoma using an information extraction model: a real-world study based on electronic medical records.采用信息提取模型为垂体腺瘤制定标准化诊断陈述:基于电子病历的真实世界研究。
BMC Med Inform Decis Mak. 2022 Dec 7;22(1):319. doi: 10.1186/s12911-022-02031-0.
10
MLM-based typographical error correction of unstructured medical texts for named entity recognition.基于 MLM 的非结构化医疗文本命名实体识别的排版错误校正。
BMC Bioinformatics. 2022 Nov 16;23(1):486. doi: 10.1186/s12859-022-05035-9.