• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

来自未经整理的图像和报告的自监督多模态训练能够实现放射学中的人工智能监测。

Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology.

作者信息

Park Sangjoon, Lee Eun Sun, Shin Kyung Sook, Lee Jeong Eun, Ye Jong Chul

机构信息

Department of Radiation Oncology, Yonsei College of Medicine, Seoul, Republic of Korea.

Chung-Ang University Hospital, Seoul, Republic of Korea.

出版信息

Med Image Anal. 2024 Jan;91:103021. doi: 10.1016/j.media.2023.103021. Epub 2023 Nov 7.

DOI:10.1016/j.media.2023.103021
PMID:37952385
Abstract

The escalating demand for artificial intelligence (AI) systems that can monitor and supervise human errors and abnormalities in healthcare presents unique challenges. Recent advances in vision-language models reveal the challenges of monitoring AI by understanding both visual and textual concepts and their semantic correspondences. However, there has been limited success in the application of vision-language models in the medical domain. Current vision-language models and learning strategies for photographic images and captions call for a web-scale data corpus of image and text pairs which is not often feasible in the medical domain. To address this, we present a model named medical cross-attention vision-language model (Medical X-VL), which leverages key components to be tailored for the medical domain. The model is based on the following components: self-supervised unimodal models in medical domain and a fusion encoder to bridge them, momentum distillation, sentencewise contrastive learning for medical reports, and sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for monitoring AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed current state-of-the-art models in two medical image datasets, suggesting a novel clinical application of our monitoring AI model to alleviate human errors. Our method demonstrates a more specialized capacity for fine-grained understanding, which presents a distinct advantage particularly applicable to the medical domain.

摘要

对能够监测和监督医疗保健中人为错误及异常情况的人工智能(AI)系统的需求不断升级,带来了独特的挑战。视觉语言模型的最新进展揭示了通过理解视觉和文本概念及其语义对应关系来监测人工智能的挑战。然而,视觉语言模型在医学领域的应用取得的成功有限。当前针对摄影图像和字幕的视觉语言模型及学习策略需要一个网络规模的图像与文本对数据语料库,这在医学领域通常不可行。为解决这一问题,我们提出了一种名为医学交叉注意力视觉语言模型(Medical X-VL)的模型,该模型利用了为医学领域量身定制的关键组件。该模型基于以下组件:医学领域的自监督单模态模型以及用于连接它们的融合编码器、动量蒸馏、针对医学报告的逐句对比学习,以及句子相似度调整的硬负样本挖掘。我们通过实验证明,我们的模型能够实现各种用于监测人工智能的零样本任务,从零样本分类到零样本纠错。在两个医学图像数据集中,我们的模型优于当前的最先进模型,这表明我们的监测人工智能模型在减轻人为错误方面具有新的临床应用。我们的方法展示了更强的细粒度理解能力,这在医学领域具有特别明显的优势。

相似文献

1
Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology.来自未经整理的图像和报告的自监督多模态训练能够实现放射学中的人工智能监测。
Med Image Anal. 2024 Jan;91:103021. doi: 10.1016/j.media.2023.103021. Epub 2023 Nov 7.
2
Knowledge-enhanced visual-language pre-training on chest radiology images.基于胸部放射影像的知识增强视觉语言预训练。
Nat Commun. 2023 Jul 28;14(1):4542. doi: 10.1038/s41467-023-40260-7.
3
Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning.基于句子级图像-语言对比学习的多粒度放射学报告生成
IEEE Trans Med Imaging. 2024 Jul;43(7):2657-2669. doi: 10.1109/TMI.2024.3372638. Epub 2024 Jul 1.
4
Merlin: A Vision Language Foundation Model for 3D Computed Tomography.梅林:一种用于三维计算机断层扫描的视觉语言基础模型。
Res Sq. 2024 Jun 28:rs.3.rs-4546309. doi: 10.21203/rs.3.rs-4546309/v1.
5
Intensive vision-guided network for radiology report generation.基于密集视觉引导的放射科报告生成网络。
Phys Med Biol. 2024 Feb 5;69(4). doi: 10.1088/1361-6560/ad1995.
6
Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning.克服放射学中人工智能发展和实施面临的挑战:超越监督学习的解决方案综合述评。
Korean J Radiol. 2023 Nov;24(11):1061-1080. doi: 10.3348/kjr.2023.0393. Epub 2023 Aug 28.
7
Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training.通过视觉语言预训练实现医学图像与文本的多模态理解与生成
IEEE J Biomed Health Inform. 2022 Dec;26(12):6070-6080. doi: 10.1109/JBHI.2022.3207502. Epub 2022 Dec 7.
8
Improving Medical Vision-Language Contrastive Pretraining With Semantics-Aware Triage.利用语义感知分诊改进医学视觉-语言对比预训练
IEEE Trans Med Imaging. 2023 Dec;42(12):3579-3589. doi: 10.1109/TMI.2023.3294980. Epub 2023 Nov 30.
9
MKCL: Medical Knowledge with Contrastive Learning model for radiology report generation.MKCL:用于放射学报告生成的具有对比学习模型的医学知识
J Biomed Inform. 2023 Oct;146:104496. doi: 10.1016/j.jbi.2023.104496. Epub 2023 Sep 11.
10
Transparent medical image AI via an image-text foundation model grounded in medical literature.基于医学文献的图文基础模型实现透明的医学影像 AI
Nat Med. 2024 Apr;30(4):1154-1165. doi: 10.1038/s41591-024-02887-x. Epub 2024 Apr 16.

引用本文的文献

1
Explainable semi-supervised model for predicting invasion depth of esophageal squamous cell carcinoma based on the IPCL and AVA patterns.基于IPCL和AVA模式预测食管鳞状细胞癌浸润深度的可解释半监督模型
Sci Rep. 2025 Jul 2;15(1):22519. doi: 10.1038/s41598-025-06172-w.
2
Advancements in Medical Radiology Through Multimodal Machine Learning: A Comprehensive Overview.通过多模态机器学习实现医学放射学的进展:全面概述
Bioengineering (Basel). 2025 Apr 30;12(5):477. doi: 10.3390/bioengineering12050477.
3
A Systematic Review and Implementation Guidelines of Multimodal Foundation Models in Medical Imaging.
医学影像中多模态基础模型的系统评价与实施指南
Res Sq. 2025 Apr 28:rs.3.rs-5537908. doi: 10.21203/rs.3.rs-5537908/v1.
4
Cross-modal contrastive learning for unified placenta analysis using photographs.使用照片进行统一胎盘分析的跨模态对比学习
Patterns (N Y). 2024 Nov 19;5(12):101097. doi: 10.1016/j.patter.2024.101097. eCollection 2024 Dec 13.
5
IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models.IQAGPT:使用视觉语言模型和ChatGPT模型进行计算机断层扫描图像质量评估
Vis Comput Ind Biomed Art. 2024 Aug 5;7(1):20. doi: 10.1186/s42492-024-00171-w.
6
A Semi-Supervised Learning Framework for Classifying Colorectal Neoplasia Based on the NICE Classification.基于 NICE 分类的结直肠肿瘤半监督学习分类框架。
J Imaging Inform Med. 2024 Oct;37(5):2342-2353. doi: 10.1007/s10278-024-01123-9. Epub 2024 Apr 23.