• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels.通过在弱标签上微调轻量级语言模型增强放射学报告中的疾病检测
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:614-623. eCollection 2025.
2
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
3
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.评估和提高大语言模型中的辨证思维能力:方法开发研究
JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.
4
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.
5
BioInstruct: instruction tuning of large language models for biomedical natural language processing.BioInstruct:用于生物医学自然语言处理的大型语言模型的指令调整。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.
6
Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA.在印度使用专门的大语言模型进行月经健康教育:MenstLLaMA的开发与评估研究
J Med Internet Res. 2025 Jul 16;27:e71977. doi: 10.2196/71977.
7
Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.评估生物医学微调对大语言模型在临床任务上的有效性。
J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.
8
Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.评估和增强用于遗传咨询支持的日本大语言模型:领域适应的比较研究与专家评估数据集的开发
JMIR Med Inform. 2025 Jan 16;13:e65047. doi: 10.2196/65047.
9
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
10
Fine-tuning medical language models for enhanced long-contextual understanding and domain expertise.微调医学语言模型以增强长上下文理解和领域专业知识。
Quant Imaging Med Surg. 2025 Jun 6;15(6):5450-5462. doi: 10.21037/qims-2024-2655. Epub 2025 Jun 3.

引用本文的文献

1
Large Language Models in Medical Image Analysis: A Systematic Survey and Future Directions.医学图像分析中的大语言模型:系统综述与未来方向
Bioengineering (Basel). 2025 Jul 29;12(8):818. doi: 10.3390/bioengineering12080818.
2
CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray.CXR-LT 2024:一项关于胸部X光长尾、多标签和零样本疾病分类的医学图像计算方法国际会议挑战赛
Med Image Anal. 2025 Jul 29;106:103739. doi: 10.1016/j.media.2025.103739.
3
CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray.CXR-LT 2024:一项关于胸部X光长尾、多标签和零样本疾病分类的医学图像计算方法国际会议挑战赛
ArXiv. 2025 Jun 9:arXiv:2506.07984v1.
4
The Evolution of Radiology Image Annotation in the Era of Large Language Models.大语言模型时代放射学图像标注的演变
Radiol Artif Intell. 2025 Jul;7(4):e240631. doi: 10.1148/ryai.240631.

本文引用的文献

1
Closing the gap between open source and commercial large language models for medical evidence summarization.弥合用于医学证据总结的开源大型语言模型与商业大型语言模型之间的差距。
NPJ Digit Med. 2024 Sep 9;7(1):239. doi: 10.1038/s41746-024-01239-w.
2
Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs.评估 GPT-V4(具有视觉功能的 GPT-4)在检测胸部 X 光片中放射学发现的能力。
Radiology. 2024 May;311(2):e233270. doi: 10.1148/radiol.233270.
3
GPT-4: a new era of artificial intelligence in medicine.GPT-4:医学人工智能的新纪元。
Ir J Med Sci. 2023 Dec;192(6):3197-3200. doi: 10.1007/s11845-023-03377-8. Epub 2023 Apr 19.
4
Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.使用基于卷积神经网络的自然语言处理从非结构化的胸腹部计算机断层扫描报告中提取影像学发现。
PLoS One. 2020 Jul 30;15(7):e0236827. doi: 10.1371/journal.pone.0236827. eCollection 2020.
5
NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.NegBio:一种用于放射学报告中否定和不确定性检测的高性能工具。
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196. eCollection 2018.
6
Is deidentification sufficient to protect health privacy in research?去识别化是否足以在研究中保护健康隐私?
Am J Bioeth. 2010 Sep;10(9):3-11. doi: 10.1080/15265161.2010.494215.

通过在弱标签上微调轻量级语言模型增强放射学报告中的疾病检测

Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels.

作者信息

Wei Yishu, Wang Xindi, Ong Hanley, Zhou Yiliang, Flanders Adam, Shih George, Peng Yifan

机构信息

Department of Population Health Sciences, Weill Cornell Medicine, New York.

Department of Radiology, Weill Cornell Medicine, New York.

出版信息

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:614-623. eCollection 2025.

PMID:40502255
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12150749/
Abstract

Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled datasets. In this work, we investigated the potential of improving a lightweight LLM, such as Llama 3.1-8B, through fine-tuning with datasets using synthetic labels. Two tasks are jointly trained by combining their respective instruction datasets. When the quality of the task-specific synthetic labels is relatively high (e.g., generated by GPT4-o), Llama 3.1-8B achieves satisfactory performance on the open-ended disease detection task, with a micro F1 score of 0.91. Conversely, when the quality of the task-relevant synthetic labels is relatively low (e.g., from the MIMIC-CXR dataset), fine-tuned Llama 3.1-8B is able to surpass its noisy teacher labels (micro F1 score of 0.67 v.s. 0.63) when calibrated against curated labels, indicating the strong inherent underlying capability of the model. These findings demonstrate the potential offine-tuning LLMs with synthetic labels, offering a promising direction for future research on LLM specialization in the medical domain.

摘要

尽管在将大语言模型(LLMs)应用于医学领域方面取得了重大进展,但仍有一些限制因素阻碍它们的实际应用。其中包括模型规模的限制以及缺乏特定队列的标注数据集。在这项工作中,我们研究了通过使用合成标签的数据集对轻量级大语言模型(如Llama 3.1 - 8B)进行微调来提升其性能的潜力。通过合并各自的指令数据集来联合训练两个任务。当特定任务的合成标签质量相对较高时(例如由GPT4 - o生成),Llama 3.1 - 8B在开放式疾病检测任务上取得了令人满意的性能,微F1分数为0.91。相反,当与任务相关的合成标签质量相对较低时(例如来自MIMIC - CXR数据集),经过微调的Llama 3.1 - 8B在根据精心策划的标签进行校准后,能够超越其有噪声的教师标签(微F1分数分别为0.67和0.63),这表明该模型具有强大的内在潜在能力。这些发现证明了使用合成标签对大语言模型进行微调的潜力,为医学领域大语言模型专业化的未来研究提供了一个有前景的方向。