Chen Xiaolan, Xu Pusheng, Li Yao, Zhang Weiyi, Song Fan, He Mingguang, Shi Danli
School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong.
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
iScience. 2024 May 17;27(7):110021. doi: 10.1016/j.isci.2024.110021. eCollection 2024 Jul 19.
Existing automatic analysis of fundus fluorescein angiography (FFA) images faces limitations, including a predetermined set of possible image classifications and being confined to text-based question-answering (QA) approaches. This study aims to address these limitations by developing an end-to-end unified model that utilizes synthetic data to train a visual question-answering model for FFA images. To achieve this, we employed ChatGPT to generate 4,110,581 QA pairs for a large FFA dataset, which encompassed a total of 654,343 FFA images from 9,392 participants. We then fine-tuned the Bootstrapping Language-Image Pre-training (BLIP) framework to enable simultaneous handling of vision and language. The performance of the fine-tuned model (ChatFFA) was thoroughly evaluated through automated and manual assessments, as well as case studies based on an external validation set, demonstrating satisfactory results. In conclusion, our ChatFFA system paves the way for improved efficiency and feasibility in medical imaging analysis by leveraging generative large language models.
现有的眼底荧光血管造影(FFA)图像自动分析面临局限性,包括预定的一组可能的图像分类,并且局限于基于文本的问答(QA)方法。本研究旨在通过开发一种端到端统一模型来解决这些局限性,该模型利用合成数据来训练用于FFA图像的视觉问答模型。为实现这一目标,我们使用ChatGPT为一个大型FFA数据集生成了4,110,581个问答对,该数据集总共包含来自9392名参与者的654,343张FFA图像。然后,我们对自训练语言-图像预训练(BLIP)框架进行了微调,以实现对视觉和语言的同时处理。通过自动和人工评估以及基于外部验证集的案例研究,对微调后的模型(ChatFFA)的性能进行了全面评估,结果令人满意。总之,我们的ChatFFA系统通过利用生成式大语言模型,为提高医学成像分析的效率和可行性铺平了道路。