通过在弱标签上微调轻量级语言模型增强放射学报告中的疾病检测

Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels.

作者信息

Wei Yishu, Wang Xindi, Ong Hanley, Zhou Yiliang, Flanders Adam, Shih George, Peng Yifan

机构信息

Department of Population Health Sciences, Weill Cornell Medicine, New York.

Department of Radiology, Weill Cornell Medicine, New York.

出版信息

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:614-623. eCollection 2025.

PMID:40502255

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12150749/

Abstract

Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled datasets. In this work, we investigated the potential of improving a lightweight LLM, such as Llama 3.1-8B, through fine-tuning with datasets using synthetic labels. Two tasks are jointly trained by combining their respective instruction datasets. When the quality of the task-specific synthetic labels is relatively high (e.g., generated by GPT4-o), Llama 3.1-8B achieves satisfactory performance on the open-ended disease detection task, with a micro F1 score of 0.91. Conversely, when the quality of the task-relevant synthetic labels is relatively low (e.g., from the MIMIC-CXR dataset), fine-tuned Llama 3.1-8B is able to surpass its noisy teacher labels (micro F1 score of 0.67 v.s. 0.63) when calibrated against curated labels, indicating the strong inherent underlying capability of the model. These findings demonstrate the potential offine-tuning LLMs with synthetic labels, offering a promising direction for future research on LLM specialization in the medical domain.

摘要

尽管在将大语言模型（LLMs）应用于医学领域方面取得了重大进展，但仍有一些限制因素阻碍它们的实际应用。其中包括模型规模的限制以及缺乏特定队列的标注数据集。在这项工作中，我们研究了通过使用合成标签的数据集对轻量级大语言模型（如Llama 3.1 - 8B）进行微调来提升其性能的潜力。通过合并各自的指令数据集来联合训练两个任务。当特定任务的合成标签质量相对较高时（例如由GPT4 - o生成），Llama 3.1 - 8B在开放式疾病检测任务上取得了令人满意的性能，微F1分数为0.91。相反，当与任务相关的合成标签质量相对较低时（例如来自MIMIC - CXR数据集），经过微调的Llama 3.1 - 8B在根据精心策划的标签进行校准后，能够超越其有噪声的教师标签（微F1分数分别为0.67和0.63），这表明该模型具有强大的内在潜在能力。这些发现证明了使用合成标签对大语言模型进行微调的潜力，为医学领域大语言模型专业化的未来研究提供了一个有前景的方向。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过在弱标签上微调轻量级语言模型增强放射学报告中的疾病检测

Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

通过在弱标签上微调轻量级语言模型增强放射学报告中的疾病检测

Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels.

作者信息

机构信息

出版信息