Suppr超能文献

一种用于胸部X光检查结果的临床可用的小型多模态放射学模型及评估指标。

A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings.

作者信息

Zambrano Chaves Juan Manuel, Huang Shih-Cheng, Xu Yanbo, Xu Hanwen, Usuyama Naoto, Zhang Sheng, Wang Fei, Xie Yujia, Khademi Mahmoud, Yang Ziyi, Awadalla Hany, Gong Julia, Hu Houdong, Yang Jianwei, Li Chunyuan, Gao Jianfeng, Gu Yu, Wong Cliff, Wei Mu, Naumann Tristan, Chen Muhao, Lungren Matthew P, Chaudhari Akshay, Yeung-Levy Serena, Langlotz Curtis P, Wang Sheng, Poon Hoifung

机构信息

Microsoft Research, Redmond, WA, USA.

Stanford University, Stanford, CA, USA.

出版信息

Nat Commun. 2025 Apr 1;16(1):3108. doi: 10.1038/s41467-025-58344-x.

Abstract

Large foundation models show promise in biomedicine but face challenges in clinical use due to performance gaps, accessibility, cost, and lack of scalable evaluation. Here we show that open-source small multimodal models can bridge these gaps in radiology by generating free-text findings from chest X-ray images. Our data-centric approach leverages 697K curated radiology image-text pairs to train a specialized, domain-adapted chest X-ray encoder. We integrate this encoder with pre-trained language models via a lightweight adapter that aligns image and text modalities. To enable robust, clinically relevant evaluation, we develop and validate CheXprompt, a GPT-4-based metric for assessing factual accuracy aligned with radiologists' evaluations. Benchmarked with CheXprompt and other standard factuality metrics, LLaVA-Rad (7B) achieves state-of-the-art performance, outperforming much larger models like GPT-4V and Med-PaLM M (84B). While not immediately ready for real-time clinical deployment, LLaVA-Rad is a scalable, privacy-preserving and cost-effective step towards clinically adaptable multimodal AI for radiology.

摘要

大型基础模型在生物医学领域展现出了潜力,但由于性能差距、可及性、成本以及缺乏可扩展的评估方法,在临床应用中面临挑战。在此,我们表明开源的小型多模态模型可以通过从胸部X光图像生成自由文本结果来弥合放射学领域的这些差距。我们以数据为中心的方法利用69.7万个经过整理的放射学图像-文本对来训练一个专门的、针对特定领域进行调整的胸部X光编码器。我们通过一个轻量级适配器将这个编码器与预训练语言模型集成,该适配器可对齐图像和文本模态。为了实现稳健的、与临床相关的评估,我们开发并验证了CheXprompt,这是一种基于GPT-4的指标,用于评估与放射科医生评估一致的事实准确性。以CheXprompt和其他标准事实性指标为基准,LLaVA-Rad(70亿参数)取得了领先的性能,超过了诸如GPT-4V和Med-PaLM M(840亿参数)等大得多的模型。虽然LLaVA-Rad尚未立即准备好用于实时临床部署,但它是朝着可临床应用的放射学多模态人工智能迈出的可扩展、保护隐私且具有成本效益的一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ff1/11962106/bf277d86a568/41467_2025_58344_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验