文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.

作者信息

Majdik Zoltan P, Graham S Scott, Shiva Edward Jade C, Rodriguez Sabrina N, Karnes Martha S, Jensen Jared T, Barbour Joshua B, Rousseau Justin F

机构信息

Department of Communication, North Dakota State University, Fargo, ND, United States.

Department of Rhetoric & Writing, The University of Texas at Austin, Austin, TX, United States.

出版信息

JMIR AI. 2024 May 16;3:e52095. doi: 10.2196/52095.


DOI:10.2196/52095
PMID:38875593
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11140272/
Abstract

BACKGROUND: Large language models (LLMs) have the potential to support promising new applications in health informatics. However, practical data on sample size considerations for fine-tuning LLMs to perform specific tasks in biomedical and health policy contexts are lacking. OBJECTIVE: This study aims to evaluate sample size and sample selection techniques for fine-tuning LLMs to support improved named entity recognition (NER) for a custom data set of conflicts of interest disclosure statements. METHODS: A random sample of 200 disclosure statements was prepared for annotation. All "PERSON" and "ORG" entities were identified by each of the 2 raters, and once appropriate agreement was established, the annotators independently annotated an additional 290 disclosure statements. From the 490 annotated documents, 2500 stratified random samples in different size ranges were drawn. The 2500 training set subsamples were used to fine-tune a selection of language models across 2 model architectures (Bidirectional Encoder Representations from Transformers [BERT] and Generative Pre-trained Transformer [GPT]) for improved NER, and multiple regression was used to assess the relationship between sample size (sentences), entity density (entities per sentence [EPS]), and trained model performance (F-score). Additionally, single-predictor threshold regression models were used to evaluate the possibility of diminishing marginal returns from increased sample size or entity density. RESULTS: Fine-tuned models ranged in topline NER performance from F-score=0.79 to F-score=0.96 across architectures. Two-predictor multiple linear regression models were statistically significant with multiple R ranging from 0.6057 to 0.7896 (all P<.001). EPS and the number of sentences were significant predictors of F-scores in all cases ( P<.001), except for the GPT-2_large model, where EPS was not a significant predictor (P=.184). Model thresholds indicate points of diminishing marginal return from increased training data set sample size measured by the number of sentences, with point estimates ranging from 439 sentences for RoBERTa_large to 527 sentences for GPT-2_large. Likewise, the threshold regression models indicate a diminishing marginal return for EPS with point estimates between 1.36 and 1.38. CONCLUSIONS: Relatively modest sample sizes can be used to fine-tune LLMs for NER tasks applied to biomedical text, and training data entity density should representatively approximate entity density in production data. Training data quality and a model architecture's intended use (text generation vs text processing or classification) may be as, or more, important as training data volume and model parameter size.

摘要

相似文献

[1]
Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.

JMIR AI. 2024-5-16

[2]
Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study.

JMIR Med Inform. 2024-10-17

[3]
Extracting comprehensive clinical information for breast cancer using deep learning methods.

Int J Med Inform. 2019-10-2

[4]
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.

J Med Internet Res. 2021-8-9

[5]
Advancing entity recognition in biomedicine via instruction tuning of large language models.

Bioinformatics. 2024-3-29

[6]
Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records.

medRxiv. 2024-4-27

[7]
Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.

BMC Med Inform Decis Mak. 2021-7-30

[8]
Improving large language models for clinical named entity recognition via prompt engineering.

J Am Med Inform Assoc. 2024-9-1

[9]
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.

JMIR Med Inform. 2022-4-21

[10]
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

JMIR Med Inform. 2019-9-12

引用本文的文献

[1]
AI in conjunctivitis research: assessing ChatGPT and DeepSeek for etiology, intervention, and citation integrity via hallucination rate analysis.

Front Artif Intell. 2025-8-20

[2]
Precision in Parsing: Evaluation of an Open-Source Named Entity Recognizer (NER) in Veterinary Oncology.

Vet Comp Oncol. 2025-3

[3]
DeepEnhancerPPO: An Interpretable Deep Learning Approach for Enhancer Classification.

Int J Mol Sci. 2024-12-2

[4]
Use of artificial intelligence algorithms to analyse systemic sclerosis-interstitial lung disease imaging features.

Rheumatol Int. 2024-10

本文引用的文献

[1]
Large language models in health care: Development, applications, and challenges.

Health Care Sci. 2023-7-24

[2]
Lessons learned from translating AI from development to deployment in healthcare.

Nat Med. 2023-6

[3]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

[4]
Evidence for stratified conflicts of interest policies in research contexts: a methodological review.

BMJ Open. 2022-9-19

[5]
Tasks as needs: reframing the paradigm of clinical natural language processing research for real-world decision support.

J Am Med Inform Assoc. 2022-9-12

[6]
Associations Between Aggregate NLP-Extracted Conflicts of Interest and Adverse Events by Drug Product.

Stud Health Technol Inform. 2022-6-6

[7]
A systematic review on natural language processing systems for eligibility prescreening in clinical research.

J Am Med Inform Assoc. 2021-12-28

[8]
Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record Deidentification.

AMIA Jt Summits Transl Sci Proc. 2021

[9]
Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases.

Patterns (N Y). 2021-6-17

[10]
Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study.

JMIR Med Inform. 2021-5-5

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索