Suppr超能文献

BPI-MVQA:一种用于医学视觉问答的双分支模型。

BPI-MVQA: a bi-branch model for medical visual question answering.

机构信息

Kunming Shipborne Equipment Research and Test Center, Kunming, 650106, People's Republic of China.

School of Information Science and Engineering, Yunnan University, No. 2, North Cuihu Road, Kunming, 650091, People's Republic of China.

出版信息

BMC Med Imaging. 2022 Apr 29;22(1):79. doi: 10.1186/s12880-022-00800-x.

Abstract

BACKGROUND

Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets.

METHOD

We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels.

RESULT

The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text], 1.4[Formula: see text], and 1.1[Formula: see text].

CONCLUSION

The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system.

摘要

背景

医学领域的视觉问答(VQA-Med)在增强疾病诊断信心和帮助患者更好地了解自身病情方面具有巨大潜力。VQA-Med 面临的挑战之一是如何更好地理解和组合医学图像(例如 X 光、磁共振成像(MRI))的语义特征,并在未标记的医学数据集中准确回答相应的问题。

方法

我们提出了一种新颖的基于并行网络和医学图像检索的双分支模型用于医学视觉问答(BPI-MVQA)。BPI-MVQA 的第一分支是基于并行网络的转换器结构,在图像序列特征和空间特征提取方面实现了互补优势,并通过多头自注意力机制隐式融合多模态特征。第二分支是检索 VGG16 网络生成的图像特征的相似性,以获得相似的文本描述作为标签。

结果

BPI-MVQA 模型在三个 VQA-Med 数据集上取得了最先进的结果,主要指标得分超过了迄今为止的最佳结果 0.2[公式:见正文]、1.4[公式:见正文]和 1.1[公式:见正文]。

结论

评估结果支持 BPI-MVQA 模型在 VQA-Med 中的有效性。双分支结构的设计有助于模型回答不同类型的视觉问题。并行网络允许多角度的图像特征提取,这种独特的特征提取方法有助于模型更好地理解图像的语义信息,并在 VQA-Med 的多分类中实现更高的准确性。此外,从理解图像提供的信息的角度来看,图像检索有助于模型回答不规则、开放式的问题。我们的方法与三个数据集上的最新方法的比较也表明,我们的方法可以为 VQA-Med 系统带来实质性的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7622/9052498/74841e363b1a/12880_2022_800_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验