Kurz Christoph F, Merzhevich Tatiana, Eskofier Bjoern M, Kather Jakob Nikolas, Gmeiner Benjamin
Novartis Pharma GmbH, Nuremberg, Germany.
Machine Learning and Data Analytics (MaD) lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander Universität, Erlangen-Nürnberg (FAU), Erlangen, Germany.
NPJ Digit Med. 2025 Jul 10;8(1):423. doi: 10.1038/s41746-025-01837-2.
The applicability of vision-language models (VLMs) for acute care in emergency and intensive care units remains underexplored. Using a multimodal dataset of diagnostic questions involving medical images and clinical context, we benchmarked several small open-source VLMs against GPT-4o. While open models demonstrated limited diagnostic accuracy (up to 40.4%), GPT-4o significantly outperformed them (68.1%). Findings highlight the need for specialized training and optimization to improve open-source VLMs for acute care applications.
视觉语言模型(VLM)在急诊和重症监护病房的急性护理中的适用性仍未得到充分探索。我们使用了一个包含医学图像和临床背景的诊断问题的多模态数据集,将几个小型开源VLM与GPT-4o进行了基准测试。虽然开放模型的诊断准确率有限(最高40.4%),但GPT-4o的表现明显优于它们(68.1%)。研究结果凸显了进行专门训练和优化以改进用于急性护理应用的开源VLM的必要性。