Suppr超能文献

多模态大型语言模型解读儿科放射影像的能力。

Capability of multimodal large language models to interpret pediatric radiological images.

机构信息

Department of Radiology, University of Iowa Hospitals and Clinics, Iowa City, IA, 52242, USA.

Department of Pediatrics, University of Iowa Hospitals and Clinics, Iowa City, IA, 52242, USA.

出版信息

Pediatr Radiol. 2024 Sep;54(10):1729-1737. doi: 10.1007/s00247-024-06025-0. Epub 2024 Aug 12.

Abstract

BACKGROUND

There is a dearth of artificial intelligence (AI) development and research dedicated to pediatric radiology. The newest iterations of large language models (LLMs) like ChatGPT can process image and video input in addition to text. They are thus theoretically capable of providing impressions of input radiological images.

OBJECTIVE

To assess the ability of multimodal LLMs to interpret pediatric radiological images.

MATERIALS AND METHODS

Thirty medically significant cases were collected and submitted to GPT-4 (OpenAI, San Francisco, CA), Gemini 1.5 Pro (Google, Mountain View, CA), and Claude 3 Opus (Anthropic, San Francisco, CA) with a short history for a total of 90 images. AI responses were recorded and independently assessed for accuracy by a resident and attending physician. 95% confidence intervals were determined using the adjusted Wald method.

RESULTS

Overall, the models correctly diagnosed 27.8% (25/90) of images (95% CI=19.5-37.8%), were partially correct for 13.3% (12/90) of images (95% CI=2.7-26.4%), and were incorrect for 58.9% (53/90) of images (95% CI=48.6-68.5%).

CONCLUSION

Multimodal LLMs are not yet capable of interpreting pediatric radiological images.

摘要

背景

专门针对儿科放射学的人工智能(AI)开发和研究很少。像 ChatGPT 这样的最新迭代的大型语言模型(LLM)除了文本外,还可以处理图像和视频输入。因此,从理论上讲,它们能够提供输入放射图像的印象。

目的

评估多模态 LLM 解读儿科放射图像的能力。

材料和方法

收集了 30 个具有医学意义的病例,并将其提交给 GPT-4(OpenAI,旧金山,CA)、Gemini 1.5 Pro(Google,山景城,CA)和 Claude 3 Opus(Anthropic,旧金山,CA),每位医生总共提交了 90 张图像。记录 AI 响应,并由住院医师和主治医生独立评估准确性。使用调整后的 Wald 方法确定 95%置信区间。

结果

总体而言,模型正确诊断了 27.8%(25/90)的图像(95%CI=19.5-37.8%),部分正确诊断了 13.3%(12/90)的图像(95%CI=2.7-26.4%),错误诊断了 58.9%(53/90)的图像(95%CI=48.6-68.5%)。

结论

多模态 LLM 还不能解读儿科放射图像。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验