文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing.

作者信息

Hsu Enshuo, Roberts Kirk

机构信息

University of Texas Health Science Center at Houston.

出版信息

Res Sq. 2024 Jun 28:rs.3.rs-4559971. doi: 10.21203/rs.3.rs-4559971/v1.


DOI:10.21203/rs.3.rs-4559971/v1
PMID:38978609
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11230489/
Abstract

The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning offer partial solutions to this issue, particularly using large language models (LLMs), but their performance still trails traditional supervised methods with moderate amounts of gold-standard data. In particular, inferencing with LLMs is computationally heavy. We propose an approach leveraging fine-tuning LLMs and weak supervision with virtually no domain knowledge that still achieves consistently dominant performance. Using a prompt-based approach, the LLM is used to generate weakly-labeled data for training a downstream BERT model. The weakly supervised model is then further fine-tuned on small amounts of gold standard data. We evaluate this approach using Llama2 on three different n2c2 datasets. With no more than 10 gold standard notes, our final BERT models weakly supervised by fine-tuned Llama2-13B consistently outperformed out-of-the-box PubMedBERT by 4.7-47.9% in F1 scores. With only 50 gold standard notes, our models achieved close performance to fully fine-tuned systems.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3060/11230489/e4f6fedbad99/nihpp-rs4559971v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3060/11230489/ab616dc08bf7/nihpp-rs4559971v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3060/11230489/095bf1327915/nihpp-rs4559971v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3060/11230489/e4f6fedbad99/nihpp-rs4559971v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3060/11230489/ab616dc08bf7/nihpp-rs4559971v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3060/11230489/095bf1327915/nihpp-rs4559971v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3060/11230489/e4f6fedbad99/nihpp-rs4559971v1-f0003.jpg

相似文献

[1]
Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing.

Res Sq. 2024-6-28

[2]
Leveraging large language models for knowledge-free weak supervision in clinical natural language processing.

Sci Rep. 2025-3-10

[3]
A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025-3-1

[4]
Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.

BMC Med Inform Decis Mak. 2022-7-7

[5]
Identification of asthma control factor in clinical notes using a hybrid deep learning model.

BMC Med Inform Decis Mak. 2021-11-9

[6]
Automated Transformation of Unstructured Cardiovascular Diagnostic Reports into Structured Datasets Using Sequentially Deployed Large Language Models.

medRxiv. 2024-10-8

[7]
Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam.

Int J Med Inform. 2022-11

[8]
On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models.

J Biomed Inform. 2024-9

[9]
Weakly supervised spatial relation extraction from radiology reports.

JAMIA Open. 2023-4-22

[10]
Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study.

JMIR Med Inform. 2024-12-20

本文引用的文献

[1]
Annotated dataset creation through large language models for non-english medical NLP.

J Biomed Inform. 2023-9

[2]
Few-shot learning for medical text: A review of advances, trends, and opportunities.

J Biomed Inform. 2023-8

[3]
Leveraging pre-trained language models for mining microbiome-disease relationships.

BMC Bioinformatics. 2023-7-19

[4]
Ontology-driven and weakly supervised rare disease identification from clinical notes.

BMC Med Inform Decis Mak. 2023-5-5

[5]
Weakly supervised spatial relation extraction from radiology reports.

JAMIA Open. 2023-4-22

[6]
Not so weak PICO: leveraging weak supervision for participants, interventions, and outcomes recognition for systematic review automation.

JAMIA Open. 2023-1-9

[7]
Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.

BMC Med Inform Decis Mak. 2022-7-7

[8]
Identification of asthma control factor in clinical notes using a hybrid deep learning model.

BMC Med Inform Decis Mak. 2021-11-9

[9]
Weakly Supervised Learning for Categorization of Medical Inquiries for Customer Service Effectiveness.

Front Res Metr Anal. 2021-8-2

[10]
A Weakly-Supervised Named Entity Recognition Machine Learning Approach for Emergency Medical Services Clinical Audit.

Int J Environ Res Public Health. 2021-7-22

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索