文献检索，用中文搜 PubMed

Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.

Vision-language model for report generation and outcome prediction in CT pulmonary angiogram.

作者信息

Zhong Zhusi, Wang Yuli, Wu Jing, Hsu Wen-Chi, Somasundaram Vin, Bi Lulu, Kulkarni Shreyas, Ma Zhuoqi, Collins Scott, Baird Grayson, Ahn Sun Ho, Feng Xue, Kamel Ihab, Lin Cheng Ting, Greineder Colin, Atalay Michael, Jiao Zhicheng, Bai Harrison

机构信息

Department of Diagnostic Imaging, Brown University Health, Providence, RI, USA.

Warren Alpert Medical School of Brown University, Providence, RI, USA.

出版信息

NPJ Digit Med. 2025 Jul 12;8(1):432. doi: 10.1038/s41746-025-01807-8.

由于现有人工智能工具的特异性和结构有限，从计算机断层扫描肺动脉造影（CTPA）扫描中准确、全面地解读肺栓塞（PE）仍然是一项临床挑战。我们提出了一个基于智能体的框架，该框架集成了用于检测32种与PE相关异常的视觉语言模型（VLM）和用于生成结构化报告的大语言模型（LLM）。该模型在布朗大学健康系统（BUH）、约翰·霍普金斯大学（JHU）的24890名患者的69000多项CTPA研究以及斯坦福大学的INSPECT数据集上进行了训练，在异常分类和报告生成方面表现出强大的性能。在异常分类方面，它在BUH数据集上的曲线下面积（AUROC）得分为0.788，在INSPECT数据集上为0.754，在JHU数据集上为0.710，相应的BERT-F1分数分别为0.891、0.829和0.842。基于异常的报告策略始终优于基于器官和整体字幕的基线方法。在生存预测方面，一个整合了影像、临床变量、诊断输出和生成报告的多模态融合模型在BUH数据集上的一致性指数为0.863，在JHU数据集上为0.731，优于传统的肺栓塞严重指数（PESI）评分。该框架为端到端的PE诊断、结构化报告和结果预测提供了一个具有临床意义且可解释的解决方案。