• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自然语言编程聊天机器人:生成颈椎MRI影像报告

Programming Chatbots Using Natural Language: Generating Cervical Spine MRI Impressions.

作者信息

Javan Ramin, Kim Theodore, Abdelmonem Ahmed, Ismail Ahmed, Jaamour Farris, Melnyk Oleksiy, Heekin Mary

机构信息

Department of Radiology, George Washington University School of Medicine and Health Sciences, Washington, D.C., USA.

Department of Research, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA.

出版信息

Cureus. 2024 Sep 14;16(9):e69410. doi: 10.7759/cureus.69410. eCollection 2024 Sep.

DOI:10.7759/cureus.69410
PMID:39403651
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11472864/
Abstract

PURPOSE

The utility of machine learning, specifically large language models (LLMs), in the medical field has gained considerable attention. However, there is a scarcity of studies that focus on the application of LLMs in generating custom subspecialty radiology impressions. The primary objective of this study is to evaluate and compare the performance of multiple LLMs in generating specialized, accurate, and clinically useful radiology impressions for degenerative cervical spine MRI reports.

MATERIALS AND METHODS

The study employed a comparative analysis of multiple LLMs, including OpenAI's ChatGPT-3.5 and GPT-4 (OpenAI, San Francisco, CA), Antrhopic's Claude 2 (Anthropic PBC, San Francisco, CA), Google's Bard (Google Inc., Mountain View, CA), and Meta's Llama 2 (Meta Platforms, Inc., Menlo Park, CA). This was performed during January-February 2024. These models were evaluated using a few-shot learning approach on a dataset consisting of 10 examples from 50 synthetically generated MRI reports. Performance metrics evaluated were diagnostic accuracy, stylistic accuracy, and redundancy.

RESULTS

While Claude 2 maintained consistent high performance across 40 cases, GPT-4 required midway re-training to improve its declining scores. Both Claude 2 and GPT-4 demonstrated the ability to generate structured impressions, but Claude 2's specialized summarization capabilities provided an edge in maintaining accuracy without continuous feedback. The other LLMs' performance was subpar.

CONCLUSION

The findings of this study suggest that LLMs can be a valuable tool in automating the generation of radiology impressions. Claude 2, in particular, exhibited promising results, indicating its potential for clinical implementation. However, the study also points to the necessity for further research, especially in optimizing model performance and evaluating real-world applicability.

摘要

目的

机器学习,特别是大语言模型(LLMs)在医学领域的应用已引起广泛关注。然而,专注于大语言模型在生成定制亚专业放射学诊断报告方面应用的研究却很匮乏。本研究的主要目的是评估和比较多个大语言模型在为退行性颈椎MRI报告生成专业、准确且具有临床实用性的放射学诊断报告方面的性能。

材料与方法

本研究对多个大语言模型进行了对比分析,包括OpenAI的ChatGPT-3.5和GPT-4(OpenAI,加利福尼亚州旧金山)、Anthropic的Claude 2(Anthropic PBC,加利福尼亚州旧金山)、谷歌的Bard(谷歌公司,加利福尼亚州山景城)以及Meta的Llama 2(Meta平台公司,加利福尼亚州门洛帕克)。研究于2024年1月至2月进行。这些模型采用少样本学习方法,在一个由50份合成生成的MRI报告中的10个示例组成的数据集上进行评估。评估的性能指标包括诊断准确性、文体准确性和冗余性。

结果

虽然Claude 2在40个病例中保持了一致的高性能,但GPT-4需要在中途重新训练以提高其不断下降的分数。Claude 2和GPT-4都展示了生成结构化诊断报告的能力,但Claude 2的专业总结能力在无需持续反馈的情况下保持准确性方面具有优势。其他大语言模型的表现则较差。

结论

本研究结果表明,大语言模型可成为自动化生成放射学诊断报告的宝贵工具。特别是Claude 2展现出了有前景的结果,表明其具有临床应用潜力。然而,该研究也指出了进一步研究的必要性,尤其是在优化模型性能和评估实际适用性方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058b/11472864/fe8ab9c08bb5/cureus-0016-00000069410-i03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058b/11472864/dfe55e7f943d/cureus-0016-00000069410-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058b/11472864/602aa2a6db1a/cureus-0016-00000069410-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058b/11472864/fe8ab9c08bb5/cureus-0016-00000069410-i03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058b/11472864/dfe55e7f943d/cureus-0016-00000069410-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058b/11472864/602aa2a6db1a/cureus-0016-00000069410-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058b/11472864/fe8ab9c08bb5/cureus-0016-00000069410-i03.jpg

相似文献

1
Programming Chatbots Using Natural Language: Generating Cervical Spine MRI Impressions.使用自然语言编程聊天机器人:生成颈椎MRI影像报告
Cureus. 2024 Sep 14;16(9):e69410. doi: 10.7759/cureus.69410. eCollection 2024 Sep.
2
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.大型语言模型防范生成健康类虚假信息的现行保障措施、风险缓解措施和透明度措施:重复横断面分析。
BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538.
3
Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.比较流行的大语言模型在国家医学考试委员会样题上的表现。
Cureus. 2024 Mar 11;16(3):e55991. doi: 10.7759/cureus.55991. eCollection 2024 Mar.
4
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
5
Large Language Models Take on Cardiothoracic Surgery: A Comparative Analysis of the Performance of Four Models on American Board of Thoracic Surgery Exam Questions in 2023.大语言模型应用于心胸外科手术:2023年四种模型在美国胸外科医师委员会考试题目上的性能对比分析
Cureus. 2024 Jul 22;16(7):e65083. doi: 10.7759/cureus.65083. eCollection 2024 Jul.
6
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.评估印度全国医预考用大型语言模型:GPT-3.5、GPT-4 和 Bard 的比较分析。
JMIR Med Educ. 2024 Feb 21;10:e51523. doi: 10.2196/51523.
7
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
8
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
9
Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.比较分析 ChatGPT-3.5、ChatGPT-4.0 和谷歌巴德在近视防控方面的表现:大型语言模型的基准测试。
EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.
10
Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用:比较研究。
J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.

引用本文的文献

1
Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan.大型语言模型在构建头部CT放射学报告中的比较性能:日本的多机构验证研究
Jpn J Radiol. 2025 May 14. doi: 10.1007/s11604-025-01799-1.

本文引用的文献

1
ChatGPT's contributions to the evolution of neurosurgical practice and education: a systematic review of benefits, concerns and limitations.ChatGPT对神经外科实践与教育发展的贡献:对其益处、问题及局限性的系统综述
Med Glas (Zenica). 2024 Feb 1;21(1). doi: 10.17392/1661-23.
2
Evaluating large language models on medical evidence summarization.基于医学证据总结对大语言模型进行评估。
NPJ Digit Med. 2023 Aug 24;6(1):158. doi: 10.1038/s41746-023-00896-7.
3
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.
评估 ChatGPT 在整个临床工作流程中的效用:开发和可用性研究。
J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.
4
Evaluating GPT4 on Impressions Generation in Radiology Reports.评估GPT4在生成放射学报告印象方面的表现。
Radiology. 2023 Jun;307(5):e231259. doi: 10.1148/radiol.231259.
5
Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.评估 GPT 作为放射学决策辅助工具:GPT-4 与 GPT-3.5 在乳腺成像试点中的比较。
J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.
6
GPT-4 for Automated Determination of Radiological Study and Protocol based on Radiology Request Forms: A Feasibility Study.基于放射学申请单的放射学研究和方案的 GPT-4 自动确定:一项可行性研究。
Radiology. 2023 Jun;307(5):e230877. doi: 10.1148/radiol.230877.
7
Ability of ChatGPT to generate competent radiology reports for distal radius fracture by use of RSNA template items and integrated AO classifier.ChatGPT 利用 RSNA 模板项目和集成的 AO 分类器生成桡骨远端骨折有能力的放射学报告。
Curr Probl Diagn Radiol. 2024 Jan-Feb;53(1):102-110. doi: 10.1067/j.cpradiol.2023.04.001. Epub 2023 Apr 17.
8
Exploring the Potential of GPT-4 in Biomedical Engineering: The Dawn of a New Era.探索GPT-4在生物医学工程中的潜力:新时代的曙光。
Ann Biomed Eng. 2023 Aug;51(8):1645-1653. doi: 10.1007/s10439-023-03221-1. Epub 2023 Apr 28.
9
Does GPT4 dream of counting electric nodules?GPT4会梦想着数电子结节吗?
Eur Radiol. 2023 Oct;33(10):6756-6758. doi: 10.1007/s00330-023-09671-4. Epub 2023 Apr 26.
10
GPT-4: a new era of artificial intelligence in medicine.GPT-4:医学人工智能的新纪元。
Ir J Med Sci. 2023 Dec;192(6):3197-3200. doi: 10.1007/s11845-023-03377-8. Epub 2023 Apr 19.