IQAGPT：使用视觉语言模型和ChatGPT模型进行计算机断层扫描图像质量评估

IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models.

作者信息

Chen Zhihao, Hu Bin, Niu Chuang, Chen Tao, Li Yuxin, Shan Hongming, Wang Ge

机构信息

Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China.

Department of Radiology, Huashan Hospital, Fudan University, Shanghai, 200040, China.

出版信息

Vis Comput Ind Biomed Art. 2024 Aug 5;7(1):20. doi: 10.1186/s42492-024-00171-w.

DOI:10.1186/s42492-024-00171-w

PMID:39101954

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11300764/

Abstract

Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) that learn rich vision-language correlation from image-text pairs, like BLIP-2 and GPT-4, have been intensively investigated. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains unexplored. This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this study introduces IQAGPT, an innovative computed tomography (CT) IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports. First, a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation. To better leverage the capabilities of LLMs, the annotated quality scores are converted into semantically rich text descriptions using a prompt template. Second, the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate quality descriptions. The captioning model fuses image and text features through cross-modal attention. Third, based on the quality descriptions, users verbally request ChatGPT to rate image-quality scores or produce radiological quality reports. Results demonstrate the feasibility of assessing image quality using LLMs. The proposed IQAGPT outperformed GPT-4 and CLIP-IQA, as well as multitask classification and regression models that solely rely on images.

摘要

诸如ChatGPT这样的大语言模型（LLMs）在各种任务中展现出了令人印象深刻的能力，并作为跨多个领域的自然语言接口吸引了越来越多的关注。最近，像BLIP-2和GPT-4这样从图像-文本对中学习丰富视觉-语言相关性的大型视觉-语言模型（VLMs）受到了深入研究。然而，尽管有这些进展，LLMs和VLMs在图像质量评估（IQA）中的应用，特别是在医学成像领域，仍未得到探索。这对于客观性能评估以及潜在补充甚至替代放射科医生的意见具有重要价值。为此，本研究引入了IQAGPT，这是一种创新的计算机断层扫描（CT）IQA系统，它将图像质量字幕VLM与ChatGPT集成，以生成质量分数和文本报告。首先，一个包含1000个具有不同质量水平的CT切片的CT-IQA数据集被专业注释和整理，用于训练和评估。为了更好地利用LLMs的能力，使用提示模板将注释的质量分数转换为语义丰富的文本描述。其次，在CT-IQA数据集上对图像质量字幕VLM进行微调，以生成质量描述。字幕模型通过跨模态注意力融合图像和文本特征。第三，基于质量描述，用户通过口头请求ChatGPT对图像质量分数进行评分或生成放射学质量报告。结果证明了使用LLMs评估图像质量的可行性。所提出的IQAGPT优于GPT-4和CLIP-IQA，以及仅依赖图像的多任务分类和回归模型。