Alsharid Mohammad, El-Bouri Rasheed, Sharma Harshita, Drukker Lior, Papageorghiou Aris T, Noble J Alison
Institute of Biomedical Engineering, University of Oxford, UK.
Nuffield Dept. of Women's & Reproductive Health, University of Oxford, UK.
Med Ultrasound Preterm Perinat Paediatr Image Anal (2020). 2020 Oct;12437:75-84. doi: 10.1007/978-3-030-60334-2_8. Epub 2020 Oct 1.
We present a novel curriculum learning approach to train a natural language processing (NLP) based fetal ultrasound image captioning model. Datasets containing medical images and corresponding textual descriptions are relatively rare and hence, smaller-sized when compared to the datasets of natural images and their captions. This fact inspired us to develop an approach to train a captioning model suitable for small-sized medical data. Our datasets are prepared using real-world ultrasound video along with synchronised and transcribed sonographer speech recordings. We propose a "dual-curriculum" method for the ultrasound image captioning problem. The method relies on building and learning from curricula of image and text information for the ultrasound image captioning problem. We compare several distance measures for creating the dual curriculum and observe the best performance using the Wasserstein distance for image information and tf-idf metric for text information. The evaluation results show an improvement in all performance metrics when using curriculum learning over stochastic mini-batch training for the individual task of image classification as well as using a dual curriculum for image captioning.
我们提出了一种新颖的课程学习方法,用于训练基于自然语言处理(NLP)的胎儿超声图像字幕模型。包含医学图像及相应文本描述的数据集相对较少,因此与自然图像及其字幕的数据集相比规模较小。这一事实促使我们开发一种方法来训练适用于小规模医学数据的字幕模型。我们的数据集是使用真实世界的超声视频以及同步转录的超声医师语音记录来准备的。针对超声图像字幕问题,我们提出了一种“双课程”方法。该方法依赖于为超声图像字幕问题构建图像和文本信息的课程并从中学习。我们比较了几种用于创建双课程的距离度量,并观察到使用瓦瑟斯坦距离处理图像信息和使用tf-idf度量处理文本信息时性能最佳。评估结果表明,在图像分类的单个任务中,与随机小批量训练相比,使用课程学习以及在图像字幕中使用双课程时,所有性能指标均有所提高。