ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain.
Med Image Anal. 2024 Oct;97:103264. doi: 10.1016/j.media.2024.103264. Epub 2024 Jul 8.
Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.
自然图像字幕生成(NIC)是计算机视觉(CV)和自然语言处理(NLP)交叉领域的一个跨学科研究领域。已经有一些关于这个主题的工作,从早期的基于模板的方法到最近的基于深度学习的方法都有涉及。本文对 NIC 领域进行了调查,特别是侧重于其在放射学领域的医学图像字幕生成(MIC)和诊断字幕生成(DC)中的应用。对该领域的最新技术进行了综述,总结了 NIC 和 DC 中的关键研究工作,提供了该主题的广泛概述。这些工作包括现有的 NIC 和 MIC 模型、数据集、评估指标以及专业文献中的以前的综述。对修改后的工作进行了深入分析和讨论,强调了现有方法的局限性及其在实际临床实践中的潜在影响。同样,根据检测到的局限性,概述了未来潜在的研究方向。