Chen Xiaolan, Zhang Weiyi, Xu Pusheng, Zhao Ziwei, Zheng Yingfeng, Shi Danli, He Mingguang
School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China.
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China.
NPJ Digit Med. 2024 May 3;7(1):111. doi: 10.1038/s41746-024-01101-z.
Fundus fluorescein angiography (FFA) is a crucial diagnostic tool for chorioretinal diseases, but its interpretation requires significant expertise and time. Prior studies have used Artificial Intelligence (AI)-based systems to assist FFA interpretation, but these systems lack user interaction and comprehensive evaluation by ophthalmologists. Here, we used large language models (LLMs) to develop an automated interpretation pipeline for both report generation and medical question-answering (QA) for FFA images. The pipeline comprises two parts: an image-text alignment module (Bootstrapping Language-Image Pre-training) for report generation and an LLM (Llama 2) for interactive QA. The model was developed using 654,343 FFA images with 9392 reports. It was evaluated both automatically, using language-based and classification-based metrics, and manually by three experienced ophthalmologists. The automatic evaluation of the generated reports demonstrated that the system can generate coherent and comprehensible free-text reports, achieving a BERTScore of 0.70 and F1 scores ranging from 0.64 to 0.82 for detecting top-5 retinal conditions. The manual evaluation revealed acceptable accuracy (68.3%, Kappa 0.746) and completeness (62.3%, Kappa 0.739) of the generated reports. The generated free-form answers were evaluated manually, with the majority meeting the ophthalmologists' criteria (error-free: 70.7%, complete: 84.0%, harmless: 93.7%, satisfied: 65.3%, Kappa: 0.762-0.834). This study introduces an innovative framework that combines multi-modal transformers and LLMs, enhancing ophthalmic image interpretation, and facilitating interactive communications during medical consultation.
眼底荧光血管造影(FFA)是诊断脉络膜视网膜疾病的重要工具,但其解读需要专业知识和时间。以往的研究使用基于人工智能(AI)的系统辅助FFA解读,但这些系统缺乏用户交互,也未经过眼科医生的全面评估。在此,我们使用大语言模型(LLMs)开发了一个自动解读流程,用于生成FFA图像的报告和进行医学问答(QA)。该流程包括两个部分:用于生成报告的图像-文本对齐模块(自训练语言-图像预训练)和用于交互式QA的大语言模型(Llama 2)。该模型使用654,343张FFA图像和9392份报告进行开发。我们使用基于语言和分类的指标对其进行自动评估,并由三位经验丰富的眼科医生进行人工评估。对生成报告的自动评估表明,该系统能够生成连贯且易懂的自由文本报告,在检测前5种视网膜疾病时,BERTScore达到0.70,F1分数在0.64至0.82之间。人工评估显示,生成报告的准确性(68.3%,Kappa 0.746)和完整性(62.3%,Kappa 0.739)可接受。对生成的自由形式答案进行人工评估,大多数答案符合眼科医生的标准(无错误:70.7%,完整:84.0%,无害:93.7%,满意:65.3%,Kappa:0.762 - 0.834)。本研究引入了一个创新框架,该框架结合了多模态变压器和大语言模型,增强了眼科图像解读能力,并在医疗咨询过程中促进了交互式沟通。