Suppr超能文献

用于急诊和重症监护环境诊断的视觉语言模型基准测试。

Benchmarking vision-language models for diagnostics in emergency and critical care settings.

作者信息

Kurz Christoph F, Merzhevich Tatiana, Eskofier Bjoern M, Kather Jakob Nikolas, Gmeiner Benjamin

机构信息

Novartis Pharma GmbH, Nuremberg, Germany.

Machine Learning and Data Analytics (MaD) lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander Universität, Erlangen-Nürnberg (FAU), Erlangen, Germany.

出版信息

NPJ Digit Med. 2025 Jul 10;8(1):423. doi: 10.1038/s41746-025-01837-2.

Abstract

The applicability of vision-language models (VLMs) for acute care in emergency and intensive care units remains underexplored. Using a multimodal dataset of diagnostic questions involving medical images and clinical context, we benchmarked several small open-source VLMs against GPT-4o. While open models demonstrated limited diagnostic accuracy (up to 40.4%), GPT-4o significantly outperformed them (68.1%). Findings highlight the need for specialized training and optimization to improve open-source VLMs for acute care applications.

摘要

视觉语言模型(VLM)在急诊和重症监护病房的急性护理中的适用性仍未得到充分探索。我们使用了一个包含医学图像和临床背景的诊断问题的多模态数据集,将几个小型开源VLM与GPT-4o进行了基准测试。虽然开放模型的诊断准确率有限(最高40.4%),但GPT-4o的表现明显优于它们(68.1%)。研究结果凸显了进行专门训练和优化以改进用于急性护理应用的开源VLM的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4316/12246445/ef0a51dadffe/41746_2025_1837_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验