一种基于课程学习的超声图像字幕生成方法。

A Curriculum Learning Based Approach to Captioning Ultrasound Images.

作者信息

Alsharid Mohammad, El-Bouri Rasheed, Sharma Harshita, Drukker Lior, Papageorghiou Aris T, Noble J Alison

机构信息

Institute of Biomedical Engineering, University of Oxford, UK.

Nuffield Dept. of Women's & Reproductive Health, University of Oxford, UK.

出版信息

Med Ultrasound Preterm Perinat Paediatr Image Anal (2020). 2020 Oct;12437:75-84. doi: 10.1007/978-3-030-60334-2_8. Epub 2020 Oct 1.

DOI:10.1007/978-3-030-60334-2_8

PMID:33103165

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7116255/

Abstract

We present a novel curriculum learning approach to train a natural language processing (NLP) based fetal ultrasound image captioning model. Datasets containing medical images and corresponding textual descriptions are relatively rare and hence, smaller-sized when compared to the datasets of natural images and their captions. This fact inspired us to develop an approach to train a captioning model suitable for small-sized medical data. Our datasets are prepared using real-world ultrasound video along with synchronised and transcribed sonographer speech recordings. We propose a "dual-curriculum" method for the ultrasound image captioning problem. The method relies on building and learning from curricula of image and text information for the ultrasound image captioning problem. We compare several distance measures for creating the dual curriculum and observe the best performance using the Wasserstein distance for image information and tf-idf metric for text information. The evaluation results show an improvement in all performance metrics when using curriculum learning over stochastic mini-batch training for the individual task of image classification as well as using a dual curriculum for image captioning.

摘要

我们提出了一种新颖的课程学习方法，用于训练基于自然语言处理（NLP）的胎儿超声图像字幕模型。包含医学图像及相应文本描述的数据集相对较少，因此与自然图像及其字幕的数据集相比规模较小。这一事实促使我们开发一种方法来训练适用于小规模医学数据的字幕模型。我们的数据集是使用真实世界的超声视频以及同步转录的超声医师语音记录来准备的。针对超声图像字幕问题，我们提出了一种“双课程”方法。该方法依赖于为超声图像字幕问题构建图像和文本信息的课程并从中学习。我们比较了几种用于创建双课程的距离度量，并观察到使用瓦瑟斯坦距离处理图像信息和使用tf-idf度量处理文本信息时性能最佳。评估结果表明，在图像分类的单个任务中，与随机小批量训练相比，使用课程学习以及在图像字幕中使用双课程时，所有性能指标均有所提高。

相似文献

A Curriculum Learning Based Approach to Captioning Ultrasound Images.一种基于课程学习的超声图像字幕生成方法。

Med Ultrasound Preterm Perinat Paediatr Image Anal (2020). 2020 Oct;12437:75-84. doi: 10.1007/978-3-030-60334-2_8. Epub 2020 Oct 1.

Weakly Supervised Captioning of Ultrasound Images.超声图像的弱监督字幕生成

Med Image Underst Anal (2022). 2022 Jul;13413:187-198. doi: 10.1007/978-3-031-12053-4_14.

Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks.使用三向多模态深度神经网络的胎儿超声视频注视辅助自动字幕生成。

Med Image Anal. 2022 Nov;82:102630. doi: 10.1016/j.media.2022.102630. Epub 2022 Sep 17.

Captioning Ultrasound Images Automatically.自动为超声图像添加字幕。

Med Image Comput Comput Assist Interv. 2019 Oct;22:338-346. doi: 10.1007/978-3-030-32251-9_37. Epub 2019 Oct 10.

A Course-Focused Dual Curriculum For Image Captioning.一种针对图像字幕的以课程为重点的双轨课程。

Proc IEEE Int Symp Biomed Imaging. 2021 Apr;2021:716-720. doi: 10.1109/ISBI48211.2021.9434055. Epub 2021 May 25.

Towards Generating and Evaluating Iconographic Image Captions of Artworks.迈向生成与评估艺术作品的图像说明文字

J Imaging. 2021 Jul 23;7(8):123. doi: 10.3390/jimaging7080123.

Image Captioning Based on Semantic Scenes.基于语义场景的图像字幕

Entropy (Basel). 2024 Oct 18;26(10):876. doi: 10.3390/e26100876.

Arabic Captioning for Images of Clothing Using Deep Learning.基于深度学习的服装图像阿拉伯语字幕生成。

Sensors (Basel). 2023 Apr 7;23(8):3783. doi: 10.3390/s23083783.

A dental intraoral image dataset of gingivitis for image captioning.用于图像字幕的牙龈炎口腔内牙齿图像数据集。

Data Brief. 2024 Sep 19;57:110960. doi: 10.1016/j.dib.2024.110960. eCollection 2024 Dec.

From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation.从视觉到文本：医学诊断和放射科报告生成中自然图像字幕的全面综述。

Med Image Anal. 2024 Oct;97:103264. doi: 10.1016/j.media.2024.103264. Epub 2024 Jul 8.

引用本文的文献

Weakly Supervised Captioning of Ultrasound Images.超声图像的弱监督字幕生成

Med Image Underst Anal (2022). 2022 Jul;13413:187-198. doi: 10.1007/978-3-031-12053-4_14.

Automatic captioning for medical imaging (MIC): a rapid review of literature.医学成像自动字幕（MIC）：文献快速综述

Artif Intell Rev. 2023;56(5):4019-4076. doi: 10.1007/s10462-022-10270-w. Epub 2022 Sep 17.

A Course-Focused Dual Curriculum For Image Captioning.一种针对图像字幕的以课程为重点的双轨课程。

Proc IEEE Int Symp Biomed Imaging. 2021 Apr;2021:716-720. doi: 10.1109/ISBI48211.2021.9434055. Epub 2021 May 25.

本文引用的文献

Hospital Admission Location Prediction via Deep Interpretable Networks for the Year-Round Improvement of Emergency Patient Care.基于深度可解释网络的全年改善急诊患者护理的住院地点预测

IEEE J Biomed Health Inform. 2021 Jan;25(1):289-300. doi: 10.1109/JBHI.2020.2990309. Epub 2021 Jan 5.

Spatio-Temporal Partitioning and Description of Full-Length Routine Fetal Anomaly Ultrasound Scans.常规胎儿异常超声全长扫描的时空分割与描述

Proc IEEE Int Symp Biomed Imaging. 2019;16:987-990. doi: 10.1109/ISBI.2019.8759149. Epub 2019 Jul 11.

Captioning Ultrasound Images Automatically.自动为超声图像添加字幕。

Med Image Comput Comput Assist Interv. 2019 Oct;22:338-346. doi: 10.1007/978-3-030-32251-9_37. Epub 2019 Oct 10.

A Curriculum Learning Strategy to Enhance the Accuracy of Classification of Various Lesions in Chest-PA X-ray Screening for Pulmonary Abnormalities.一种课程学习策略，用于提高胸部 PA 射线筛查中各种病变分类的准确性，以发现肺部异常。

Sci Rep. 2019 Oct 25;9(1):15352. doi: 10.1038/s41598-019-51832-3.

Automatic CNN-based detection of cardiac MR motion artefacts using k-space data augmentation and curriculum learning.基于自动卷积神经网络的心磁图运动伪影检测：利用 k 空间数据增强和课程学习。

Med Image Anal. 2019 Jul;55:136-147. doi: 10.1016/j.media.2019.04.009. Epub 2019 Apr 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验