Suppr超能文献

多模态转换器架构在医学图像分析和自动报告生成中的应用。

Multi-modal transformer architecture for medical image analysis and automated report generation.

机构信息

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India.

Centre for Advanced Data Science, Vellore Institute of Technology, Chennai, India.

出版信息

Sci Rep. 2024 Aug 20;14(1):19281. doi: 10.1038/s41598-024-69981-5.

Abstract

Medical practitioners examine medical images, such as X-rays, write reports based on the findings, and provide conclusive statements. Manual interpretation of the results and report generation by examiners are time-consuming processes that lead to potential delays in diagnosis. We propose an automated report generation model for medical images leveraging an encoder-decoder architecture. Our model utilizes transformer architectures, including Vision Transformer (ViT) and its variants like Data Efficient Image Transformer (DEiT) and BERT pre-training image transformer (BEiT), as an encoder. These transformers are adapted for processing to extract and gain visual information from medical images. Reports are transformed into text embeddings, and the Generative Pre-trained Transformer (GPT2) model is used as a decoder to generate medical reports. Our model utilizes a cross-attention mechanism between the vision transformer and GPT2, which enables it to create detailed and coherent medical reports based on the visual information extracted by the encoder. In our model, we have extended the report generation with general knowledge, which is independent of the inputs and provides a comprehensive report in a broad sense. We conduct our experiments on the Indiana University X-ray dataset to demonstrate the effectiveness of our models. Generated medical reports from the model are evaluated using word overlap metrics such as Bleu scores, Rouge-L, retrieval augmentation answer correctness, and similarity metrics such as skip thought cs, greedy matching, vector extrema, and RAG answer similarity. Results show that our model is performing better than the recurrent models in terms of report generation, answer similarity, and word overlap metrics. By automating the report generation process and incorporating advanced transformer architectures and general knowledge, our approach has the potential to significantly improve the efficiency and accuracy of medical image analysis and report generation.

摘要

医学从业者检查医学图像,例如 X 光片,根据检查结果编写报告,并提供明确的结论。检查人员手动解释结果和生成报告是一个耗时的过程,可能导致诊断延迟。我们提出了一种利用编码器-解码器架构的医学图像自动报告生成模型。我们的模型使用了包括 Vision Transformer(ViT)及其变体,如 Data Efficient Image Transformer(DEiT)和 BERT pre-training image transformer(BEiT),作为编码器的 Transformer 架构。这些 Transformer 经过调整后可用于处理,以从医学图像中提取和获取视觉信息。报告被转换为文本嵌入,使用生成式预训练转换器(GPT2)模型作为解码器来生成医学报告。我们的模型利用 Vision Transformer 和 GPT2 之间的交叉注意机制,使模型能够根据编码器提取的视觉信息生成详细和连贯的医学报告。在我们的模型中,我们扩展了报告生成功能,包括与输入无关的一般知识,从而提供更全面的报告。我们在印第安纳大学 X 射线数据集上进行实验,以证明我们模型的有效性。使用词重叠指标,如 BLEU 分数、ROUGE-L、检索增强答案正确性,以及相似性指标,如 skip thought cs、贪婪匹配、向量极值和 RAG 答案相似性,评估模型生成的医学报告。结果表明,我们的模型在报告生成、答案相似性和词重叠指标方面的性能优于递归模型。通过自动化报告生成过程并结合先进的 Transformer 架构和一般知识,我们的方法有可能显著提高医学图像分析和报告生成的效率和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3383/11336090/ae6b2885f49b/41598_2024_69981_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验