• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型在命名实体识别中的性能与可重复性:在受控环境中使用的考量

Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments.

作者信息

Dietrich Jürgen, Hollstein André

机构信息

Pharmaceuticals, Medical Affairs and Pharmacovigilance, Data Science and Insights, Bayer AG, Müllerstr. 178, 13353, Berlin, Germany.

出版信息

Drug Saf. 2025 Mar;48(3):287-303. doi: 10.1007/s40264-024-01499-1. Epub 2024 Dec 11.

DOI:10.1007/s40264-024-01499-1
PMID:39661234
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11829833/
Abstract

INTRODUCTION

Recent artificial intelligence (AI) advances can generate human-like responses to a wide range of queries, making them a useful tool for healthcare applications. Therefore, the potential use of large language models (LLMs) in controlled environments regarding efficacy, reproducibility, and operability will be of paramount interest.

OBJECTIVE

We investigated if and how GPT 3.5 and GPT 4 models can be directly used as a part of a GxP validated system and compared the performance of externally hosted GPT 3.5 and GPT 4 against LLMs, which can be hosted internally. We explored zero-shot LLM performance for named entity recognition (NER) and relation extraction tasks, investigated which LLM has the best zero-shot performance to be used potentially for generating training data proposals, evaluated the LLM performance of seven entities for medical NER in zero-shot experiments, selected one model for further performance improvement (few-shot and fine-tuning: Zephyr-7b-beta), and investigated how smaller open-source LLMs perform in contrast to GPT models and to a small fine-tuned T5 Base.

METHODS

We performed reproducibility experiments to evaluate if LLMs can be used in controlled environments and utilized guided generation to use the same prompt across multiple models. Few-shot learning and quantized low rank adapter (QLoRA) fine-tuning were applied to further improve LLM performance.

RESULTS AND CONCLUSION

We demonstrated that zero-shot GPT 4 performance is comparable with a fine-tuned T5, and Zephyr performed better than zero-shot GPT 3.5, but the recognition of product combinations such as product event combination was significantly better by using a fine-tuned T5. Although Open AI launched recently GPT versions to improve the generation of consistent output, both GPT variants failed to demonstrate reproducible results. The lack of reproducibility together with limitations of external hosted systems to keep validated systems in a state of control may affect the use of closed and proprietary models in regulated environments. However, due to the good NER performance, we recommend using GPT for creating annotation proposals for training data as a basis for fine-tuning.

摘要

引言

近期人工智能(AI)的进展能够针对广泛的问题生成类似人类的回答,使其成为医疗保健应用中的有用工具。因此,大语言模型(LLMs)在可控环境中在功效、可重复性和可操作性方面的潜在用途将备受关注。

目的

我们研究了GPT 3.5和GPT 4模型是否以及如何能够直接用作经过GxP验证的系统的一部分,并将外部托管的GPT 3.5和GPT 4与可在内部托管的大语言模型的性能进行了比较。我们探索了用于命名实体识别(NER)和关系提取任务的零样本大语言模型性能,研究了哪个大语言模型具有最佳的零样本性能,有可能用于生成训练数据提案,在零样本实验中评估了七个实体用于医学NER的大语言模型性能,选择了一个模型以进一步提高性能(少样本和微调:Zephyr - 7b - beta),并研究了较小的开源大语言模型与GPT模型以及小型微调的T5 Base相比的表现。

方法

我们进行了可重复性实验,以评估大语言模型是否可在可控环境中使用,并利用引导生成在多个模型中使用相同的提示。应用少样本学习和量化低秩适配器(QLoRA)微调以进一步提高大语言模型性能。

结果与结论

我们证明了零样本GPT 4的性能与微调后的T5相当,并且Zephyr的表现优于零样本GPT 3.5,但使用微调后的T5对产品组合(如产品事件组合)的识别明显更好。尽管OpenAI最近推出了GPT版本以改善一致输出的生成,但两个GPT变体均未能展示出可重复的结果。缺乏可重复性以及外部托管系统在将经过验证的系统保持在受控状态方面的局限性,可能会影响在受监管环境中使用封闭和专有模型。然而,由于良好的NER性能,我们建议使用GPT为训练数据创建注释提案,作为微调的基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/6bca4eae911a/40264_2024_1499_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/e51dbfe3db05/40264_2024_1499_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/2e3de63bf2ca/40264_2024_1499_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/bf7b9bd27a05/40264_2024_1499_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/299e6a3a5216/40264_2024_1499_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/e807e56cfbb9/40264_2024_1499_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/e702699d3dff/40264_2024_1499_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/c6164e46d7d0/40264_2024_1499_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/6bca4eae911a/40264_2024_1499_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/e51dbfe3db05/40264_2024_1499_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/2e3de63bf2ca/40264_2024_1499_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/bf7b9bd27a05/40264_2024_1499_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/299e6a3a5216/40264_2024_1499_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/e807e56cfbb9/40264_2024_1499_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/e702699d3dff/40264_2024_1499_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/c6164e46d7d0/40264_2024_1499_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0776/11829833/6bca4eae911a/40264_2024_1499_Fig8_HTML.jpg

相似文献

1
Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments.大型语言模型在命名实体识别中的性能与可重复性:在受控环境中使用的考量
Drug Saf. 2025 Mar;48(3):287-303. doi: 10.1007/s40264-024-01499-1. Epub 2024 Dec 11.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
4
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.
5
Using a Diverse Test Suite to Assess Large Language Models on Fast Health Care Interoperability Resources Knowledge: Comparative Analysis.使用多样化测试套件在快速医疗保健互操作性资源知识方面评估大语言模型:比较分析
J Med Internet Res. 2025 Aug 12;27:e73540. doi: 10.2196/73540.
6
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
7
Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study.揭开高级人工智能语言模型在去识别汉英混合临床文本背后的秘密:开发与验证研究。
J Med Internet Res. 2024 Jan 25;26:e48443. doi: 10.2196/48443.
8
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
9
Toward Cross-Hospital Deployment of Natural Language Processing Systems: Model Development and Validation of Fine-Tuned Large Language Models for Disease Name Recognition in Japanese.迈向自然语言处理系统的跨医院部署:用于日语疾病名称识别的微调大语言模型的模型开发与验证
JMIR Med Inform. 2025 Jul 8;13:e76773. doi: 10.2196/76773.
10
Classifying Patient Complaints Using Artificial Intelligence-Powered Large Language Models: Cross-Sectional Study.使用人工智能驱动的大语言模型对患者投诉进行分类:横断面研究
J Med Internet Res. 2025 Aug 6;27:e74231. doi: 10.2196/74231.

引用本文的文献

1
Comment on "Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments".关于《大型语言模型在命名实体识别中的性能与可重复性:在受控环境中使用的考量》的评论
Drug Saf. 2025 Sep 2. doi: 10.1007/s40264-025-01592-z.
2
Authors' response to Tiffet et al.'s comment on "Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments".作者对蒂菲特等人就《大型语言模型在命名实体识别中的性能与可重复性:在受控环境中使用的考量》所发表评论的回应。
Drug Saf. 2025 Sep 2. doi: 10.1007/s40264-025-01590-1.

本文引用的文献

1
Considerations for governing open foundation models.关于开放基础模型治理的思考。
Science. 2024 Oct 11;386(6718):151-153. doi: 10.1126/science.adp1848. Epub 2024 Oct 10.
2
AE-GPT: Using Large Language Models to extract adverse events from surveillance reports-A use case with influenza vaccine adverse events.AE-GPT:利用大语言模型从监测报告中提取不良事件——以流感疫苗不良事件为例。
PLoS One. 2024 Mar 21;19(3):e0300919. doi: 10.1371/journal.pone.0300919. eCollection 2024.
3
Improving large language models for clinical named entity recognition via prompt engineering.
通过提示工程改进临床命名实体识别的大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1812-1820. doi: 10.1093/jamia/ocad259.
4
Revisiting Relation Extraction in the era of Large Language Models.重访大语言模型时代的关系抽取
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:15566-15589. doi: 10.18653/v1/2023.acl-long.868.
5
Provision and Characterization of a Corpus for Pharmaceutical, Biomedical Named Entity Recognition for Pharmacovigilance: Evaluation of Language Registers and Training Data Sufficiency.用于药物警戒的制药、生物医学命名实体识别的语料库提供和特征描述:语言语域和训练数据充足性的评估。
Drug Saf. 2023 Aug;46(8):765-779. doi: 10.1007/s40264-023-01322-3. Epub 2023 Jun 20.
6
Future of ChatGPT in Pharmacovigilance.ChatGPT在药物警戒中的未来。
Drug Saf. 2023 Aug;46(8):711-713. doi: 10.1007/s40264-023-01315-2. Epub 2023 Jun 12.
7
Artificial intelligence and machine learning applications in biopharmaceutical manufacturing.人工智能和机器学习在生物制药制造中的应用。
Trends Biotechnol. 2023 Apr;41(4):497-510. doi: 10.1016/j.tibtech.2022.08.007. Epub 2022 Sep 15.
8
Artificial Intelligence Based on Machine Learning in Pharmacovigilance: A Scoping Review.基于机器学习的药物警戒人工智能:范围综述。
Drug Saf. 2022 May;45(5):477-491. doi: 10.1007/s40264-022-01176-1. Epub 2022 May 17.
9
Industry Perspective on Artificial Intelligence/Machine Learning in Pharmacovigilance.药物警戒人工智能/机器学习的行业视角。
Drug Saf. 2022 May;45(5):439-448. doi: 10.1007/s40264-022-01164-5. Epub 2022 May 17.
10
Artificial intelligence in drug discovery: recent advances and future perspectives.药物研发中的人工智能:最新进展与未来展望。
Expert Opin Drug Discov. 2021 Sep;16(9):949-959. doi: 10.1080/17460441.2021.1909567. Epub 2021 Apr 2.