大规模医学视觉问答数据集的开发。

Development of a large-scale medical visual question-answering dataset.

作者信息

Zhang Xiaoman, Wu Chaoyi, Zhao Ziheng, Lin Weixiong, Zhang Ya, Wang Yanfeng, Xie Weidi

机构信息

Shanghai Jiao Tong University, Shanghai, China.

Shanghai Artificial Intelligence Laboratory, Shanghai, China.

出版信息

Commun Med (Lond). 2024 Dec 21;4(1):277. doi: 10.1038/s43856-024-00709-2.

DOI:10.1038/s43856-024-00709-2

PMID:39709495

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11663219/

Abstract

BACKGROUND

Medical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.

METHODS

We constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.

RESULTS

Here, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.

CONCLUSIONS

The PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.

摘要

背景

医学视觉问答（MedVQA）通过利用人工智能解读医学图像来提高诊断准确性和医疗服务水平。本研究旨在将MedVQA重新定义为一种模拟人机交互的生成任务，并开发一种能够整合复杂视觉和文本信息的模型。

方法

我们构建了一个大规模的医学视觉问答数据集PMC-VQA，其中包含跨越149,000张图像的227,000个问答对，这些图像涵盖了各种模态和疾病。我们引入了一种生成模型，该模型将预训练视觉编码器中的视觉信息与一个大语言模型对齐。此模型最初在PMC-VQA上进行训练，随后在多个公共基准上进行微调。

结果

在此，我们表明我们的模型在生成相关、准确的自由形式答案方面显著优于现有的MedVQA模型。我们还提出了一个经过人工验证的测试集，该测试集带来了更大的挑战，并作为监测生成式MedVQA方法进展的有力指标。

结论

PMC-VQA数据集被证明是研究社区的重要资源，我们的模型在MedVQA方面取得了重大突破。我们维护了一个排行榜以促进全面评估和比较，为基准测试最先进的方法提供了一个集中资源。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

大规模医学视觉问答数据集的开发。

Development of a large-scale medical visual question-answering dataset.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

大规模医学视觉问答数据集的开发。

Development of a large-scale medical visual question-answering dataset.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献