Department of Radiology, University of Iowa Hospitals and Clinics, Iowa City, IA, 52242, USA.
Department of Pediatrics, University of Iowa Hospitals and Clinics, Iowa City, IA, 52242, USA.
Pediatr Radiol. 2024 Sep;54(10):1729-1737. doi: 10.1007/s00247-024-06025-0. Epub 2024 Aug 12.
There is a dearth of artificial intelligence (AI) development and research dedicated to pediatric radiology. The newest iterations of large language models (LLMs) like ChatGPT can process image and video input in addition to text. They are thus theoretically capable of providing impressions of input radiological images.
To assess the ability of multimodal LLMs to interpret pediatric radiological images.
Thirty medically significant cases were collected and submitted to GPT-4 (OpenAI, San Francisco, CA), Gemini 1.5 Pro (Google, Mountain View, CA), and Claude 3 Opus (Anthropic, San Francisco, CA) with a short history for a total of 90 images. AI responses were recorded and independently assessed for accuracy by a resident and attending physician. 95% confidence intervals were determined using the adjusted Wald method.
Overall, the models correctly diagnosed 27.8% (25/90) of images (95% CI=19.5-37.8%), were partially correct for 13.3% (12/90) of images (95% CI=2.7-26.4%), and were incorrect for 58.9% (53/90) of images (95% CI=48.6-68.5%).
Multimodal LLMs are not yet capable of interpreting pediatric radiological images.
专门针对儿科放射学的人工智能(AI)开发和研究很少。像 ChatGPT 这样的最新迭代的大型语言模型(LLM)除了文本外,还可以处理图像和视频输入。因此,从理论上讲,它们能够提供输入放射图像的印象。
评估多模态 LLM 解读儿科放射图像的能力。
收集了 30 个具有医学意义的病例,并将其提交给 GPT-4(OpenAI,旧金山,CA)、Gemini 1.5 Pro(Google,山景城,CA)和 Claude 3 Opus(Anthropic,旧金山,CA),每位医生总共提交了 90 张图像。记录 AI 响应,并由住院医师和主治医生独立评估准确性。使用调整后的 Wald 方法确定 95%置信区间。
总体而言,模型正确诊断了 27.8%(25/90)的图像(95%CI=19.5-37.8%),部分正确诊断了 13.3%(12/90)的图像(95%CI=2.7-26.4%),错误诊断了 58.9%(53/90)的图像(95%CI=48.6-68.5%)。
多模态 LLM 还不能解读儿科放射图像。