• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于微调的Llama 3由GPT驱动的放射学报告生成

GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3.

作者信息

Voinea Ștefan-Vlad, Mămuleanu Mădălin, Teică Rossy Vlăduț, Florescu Lucian Mihai, Selișteanu Dan, Gheonea Ioana Andreea

机构信息

Department of Automatic Control and Electronics, University of Craiova, 200585 Craiova, Romania.

Doctoral School, University of Medicine and Pharmacy of Craiova, 200349 Craiova, Romania.

出版信息

Bioengineering (Basel). 2024 Oct 18;11(10):1043. doi: 10.3390/bioengineering11101043.

DOI:10.3390/bioengineering11101043
PMID:39451418
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11504957/
Abstract

The integration of deep learning into radiology has the potential to enhance diagnostic processes, yet its acceptance in clinical practice remains limited due to various challenges. This study aimed to develop and evaluate a fine-tuned large language model (LLM), based on Llama 3-8B, to automate the generation of accurate and concise conclusions in magnetic resonance imaging (MRI) and computed tomography (CT) radiology reports, thereby assisting radiologists and improving reporting efficiency. A dataset comprising 15,000 radiology reports was collected from the University of Medicine and Pharmacy of Craiova's Imaging Center, covering a diverse range of MRI and CT examinations made by four experienced radiologists. The Llama 3-8B model was fine-tuned using transfer-learning techniques, incorporating parameter quantization to 4-bit precision and low-rank adaptation (LoRA) with a rank of 16 to optimize computational efficiency on consumer-grade GPUs. The model was trained over five epochs using an NVIDIA RTX 3090 GPU, with intermediary checkpoints saved for monitoring. Performance was evaluated quantitatively using Bidirectional Encoder Representations from Transformers Score (BERTScore), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), and Metric for Evaluation of Translation with Explicit Ordering (METEOR) metrics on a held-out test set. Additionally, a qualitative assessment was conducted, involving 13 independent radiologists who participated in a Turing-like test and provided ratings for the AI-generated conclusions. The fine-tuned model demonstrated strong quantitative performance, achieving a BERTScore F1 of 0.8054, a ROUGE-1 F1 of 0.4998, a ROUGE-L F1 of 0.4628, and a METEOR score of 0.4282. In the human evaluation, the artificial intelligence (AI)-generated conclusions were preferred over human-written ones in approximately 21.8% of cases, indicating that the model's outputs were competitive with those of experienced radiologists. The average rating of the AI-generated conclusions was 3.65 out of 5, reflecting a generally favorable assessment. Notably, the model maintained its consistency across various types of reports and demonstrated the ability to generalize to unseen data. The fine-tuned Llama 3-8B model effectively generates accurate and coherent conclusions for MRI and CT radiology reports. By automating the conclusion-writing process, this approach can assist radiologists in reducing their workload and enhancing report consistency, potentially addressing some barriers to the adoption of deep learning in clinical practice. The positive evaluations from independent radiologists underscore the model's potential utility. While the model demonstrated strong performance, limitations such as dataset bias, limited sample diversity, a lack of clinical judgment, and the need for large computational resources require further refinement and real-world validation. Future work should explore the integration of such models into clinical workflows, address ethical and legal considerations, and extend this approach to generate complete radiology reports.

摘要

将深度学习整合到放射学中有可能改善诊断流程,但由于各种挑战,其在临床实践中的接受度仍然有限。本研究旨在开发并评估一种基于Llama 3 - 8B微调的大语言模型(LLM),以自动生成磁共振成像(MRI)和计算机断层扫描(CT)放射学报告中准确且简洁的结论,从而协助放射科医生并提高报告效率。从克拉约瓦医药大学影像中心收集了一个包含15000份放射学报告的数据集,涵盖了由四位经验丰富的放射科医生进行的各种MRI和CT检查。使用迁移学习技术对Llama 3 - 8B模型进行微调,将参数量化到4位精度,并采用秩为16的低秩自适应(LoRA)来优化消费级GPU上的计算效率。该模型使用NVIDIA RTX 3090 GPU训练了五个轮次,并保存中间检查点用于监测。在一个留出的测试集上,使用来自变换器分数的双向编码器表示(BERTScore)、用于摘要评估的召回导向替身(ROUGE)、双语评估替身(BLEU)以及具有显式排序的翻译评估指标(METEOR)指标对性能进行定量评估。此外,还进行了定性评估,13名独立放射科医生参与了类似图灵测试,并对人工智能生成的结论给出评分。微调后的模型表现出强大的定量性能,BERTScore F1达到0.8054,ROUGE - 1 F1为0.4998,ROUGE - L F1为0.4628,METEOR分数为0.4282。在人工评估中,人工智能生成的结论在约21.8%的情况下比人工撰写的结论更受青睐,这表明该模型的输出与经验丰富的放射科医生的输出具有竞争力。人工智能生成结论的平均评分为3.65(满分5分),反映出总体评价良好。值得注意的是,该模型在各种类型的报告中保持了一致性,并展示了对未见数据进行泛化的能力。微调后的Llama 3 - 8B模型有效地为MRI和CT放射学报告生成准确且连贯的结论。通过自动化结论撰写过程,这种方法可以协助放射科医生减轻工作量并提高报告一致性,有可能解决临床实践中采用深度学习的一些障碍。独立放射科医生的积极评价凸显了该模型的潜在效用。虽然该模型表现出强大的性能,但诸如数据集偏差、样本多样性有限、缺乏临床判断以及需要大量计算资源等局限性仍需要进一步改进和实际验证。未来的工作应探索将此类模型整合到临床工作流程中,解决伦理和法律问题,并扩展这种方法以生成完整的放射学报告。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/6b8e374ff80e/bioengineering-11-01043-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/47eecc269377/bioengineering-11-01043-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/b14933721d5a/bioengineering-11-01043-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/3b14a4b58441/bioengineering-11-01043-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/e61264a90cca/bioengineering-11-01043-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/617ff9466837/bioengineering-11-01043-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/8c62693573a4/bioengineering-11-01043-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/8ac4ad60c579/bioengineering-11-01043-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/3d68d48abced/bioengineering-11-01043-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/508f7dde0cf2/bioengineering-11-01043-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/6b8e374ff80e/bioengineering-11-01043-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/47eecc269377/bioengineering-11-01043-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/b14933721d5a/bioengineering-11-01043-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/3b14a4b58441/bioengineering-11-01043-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/e61264a90cca/bioengineering-11-01043-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/617ff9466837/bioengineering-11-01043-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/8c62693573a4/bioengineering-11-01043-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/8ac4ad60c579/bioengineering-11-01043-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/3d68d48abced/bioengineering-11-01043-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/508f7dde0cf2/bioengineering-11-01043-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5e/11504957/6b8e374ff80e/bioengineering-11-01043-g010.jpg

相似文献

1
GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3.基于微调的Llama 3由GPT驱动的放射学报告生成
Bioengineering (Basel). 2024 Oct 18;11(10):1043. doi: 10.3390/bioengineering11101043.
2
An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study.开源微调大型语言模型在放射科印象生成中的应用:多读者性能研究。
BMC Med Imaging. 2024 Sep 27;24(1):254. doi: 10.1186/s12880-024-01435-w.
3
Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.评估生成式预训练变换器4(GPT-4)在规范放射学报告方面的性能。
Eur Radiol. 2024 Jun;34(6):3566-3574. doi: 10.1007/s00330-023-10384-x. Epub 2023 Nov 8.
4
Vision-Language Model for Generating Textual Descriptions From Clinical Images: Model Development and Validation Study.用于从临床图像生成文本描述的视觉语言模型:模型开发与验证研究
JMIR Form Res. 2024 Feb 8;8:e32690. doi: 10.2196/32690.
5
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
6
Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences.探索 ChatGPT 在医学对话总结中的潜力:一项关于与人类偏好一致性的研究。
BMC Med Inform Decis Mak. 2024 Mar 14;24(1):75. doi: 10.1186/s12911-024-02481-8.
7
ICGA-GPT: report generation and question answering for indocyanine green angiography images.ICGA-GPT:用于吲哚菁绿血管造影图像的报告生成和问答。
Br J Ophthalmol. 2024 Sep 20;108(10):1450-1456. doi: 10.1136/bjo-2023-324446.
8
Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study.评估大语言模型在总结 MRI 膝关节影像学报告方面的表现与人类相比的性能:一项可行性研究。
Int J Med Inform. 2024 Jul;187:105443. doi: 10.1016/j.ijmedinf.2024.105443. Epub 2024 Apr 4.
9
Automated classification of brain MRI reports using fine-tuned large language models.使用微调后的大语言模型对脑部磁共振成像报告进行自动分类
Neuroradiology. 2024 Dec;66(12):2177-2183. doi: 10.1007/s00234-024-03427-7. Epub 2024 Jul 12.
10
From jargon to clarity: Improving the readability of foot and ankle radiology reports with an artificial intelligence large language model.从行话到清晰明了:利用人工智能大语言模型提高足踝放射学报告的可读性
Foot Ankle Surg. 2024 Jun;30(4):331-337. doi: 10.1016/j.fas.2024.01.008. Epub 2024 Feb 5.

引用本文的文献

1
Advancements in Radiology Report Generation: A Comprehensive Analysis.放射学报告生成的进展:全面分析
Bioengineering (Basel). 2025 Jun 25;12(7):693. doi: 10.3390/bioengineering12070693.
2
Letter to Editor: Pushing large language models for improved radiomics study and research.致编辑的信:推动大语言模型以改进放射组学研究
Eur Radiol. 2025 Jul 18. doi: 10.1007/s00330-025-11863-z.

本文引用的文献

1
Refined Detection and Classification of Knee Ligament Injury Based on ResNet Convolutional Neural Networks.基于残差神经网络的膝关节韧带损伤精细检测与分类
Life (Basel). 2024 Apr 5;14(4):478. doi: 10.3390/life14040478.
2
Applications of Large Language Models in Pathology.大语言模型在病理学中的应用。
Bioengineering (Basel). 2024 Mar 31;11(4):342. doi: 10.3390/bioengineering11040342.
3
The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI.大语言模型对放射学的影响:放射科医生了解 AI 最新创新的指南。
Jpn J Radiol. 2024 Jul;42(7):685-696. doi: 10.1007/s11604-024-01552-0. Epub 2024 Mar 29.
4
Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions.医学影像学中的大语言模型:基础、应用、伦理考量、风险和未来方向。
Diagn Interv Radiol. 2024 Mar 6;30(2):80-90. doi: 10.4274/dir.2023.232417. Epub 2023 Oct 3.
5
ChatGPT in Radiology: The Advantages and Limitations of Artificial Intelligence for Medical Imaging Diagnosis.放射学中的ChatGPT:人工智能在医学影像诊断中的优势与局限
Cureus. 2023 Jul 6;15(7):e41435. doi: 10.7759/cureus.41435. eCollection 2023 Jul.
6
Artificial intelligence with magnetic resonance imaging for prediction of pathological complete response to neoadjuvant chemoradiotherapy in rectal cancer: A systematic review and meta-analysis.人工智能结合磁共振成像预测直肠癌新辅助放化疗后的病理完全缓解:一项系统评价和荟萃分析。
Front Oncol. 2022 Oct 12;12:1026216. doi: 10.3389/fonc.2022.1026216. eCollection 2022.
7
Development and validation pathways of artificial intelligence tools evaluated in randomised clinical trials.人工智能工具在随机临床试验中的开发和验证途径。
BMJ Health Care Inform. 2021 Dec;28(1). doi: 10.1136/bmjhci-2021-100466.
8
Making Radiomics More Reproducible across Scanner and Imaging Protocol Variations: A Review of Harmonization Methods.提高放射组学在不同扫描仪和成像协议之间的可重复性:协调方法综述。
J Pers Med. 2021 Aug 27;11(9):842. doi: 10.3390/jpm11090842.
9
Role of Artificial Intelligence Applications in Real-Life Clinical Practice: Systematic Review.人工智能应用在实际临床实践中的作用:系统评价
J Med Internet Res. 2021 Apr 22;23(4):e25759. doi: 10.2196/25759.
10
Transfer Learning in Breast Cancer Diagnoses via Ultrasound Imaging.通过超声成像实现乳腺癌诊断中的迁移学习
Cancers (Basel). 2021 Feb 10;13(4):738. doi: 10.3390/cancers13040738.