Zhao Xufeng, Li Chunshi, Yang Jingyuan, Gu Xingwang, Li Bing, Wang Yuelin, Zhang Bi-Lei, Li Xirong, Zhao Jianchun, Wang Jie, Yu Weihong
Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China.
Beijing Key Laboratory of Fundus Diseases Intelligent Diagnosis & Drug/Device Development and Translation, Beijing, China.
Br J Ophthalmol. 2025 Aug 20;109(9):1036-1042. doi: 10.1136/bjo-2024-326064.
To investigate rule-based and deep learning (DL)-based methods for the automatically generating natural language diagnostic reports for macular diseases.
This diagnostic study collected the ophthalmic images of 2261 eyes from 1303 patients. Colour fundus photographs and optical coherence tomography images were obtained. Eyes without retinal diseases as well as eyes diagnosed with four macular diseases were included. For each eye, a diagnostic report was written with a format consisting of lesion descriptions, diagnoses and recommendations. Subsequently, a rule-based natural language processing (NLP) and a DL-based NLP system were developed to automatically generate a diagnostic report. To assess the effectiveness of these models, two junior ophthalmologists wrote diagnostic reports for the collected images independently. A questionnaire was designed and judged by two retina specialists to grade each report's readability, correctness of diagnosis, lesion description and recommendations.
The rule-based NLP reports achieved higher grades over junior ophthalmologists in correctness of diagnosis (9.13±1.52 vs 9.03±1.42 points) and recommendations (8.55±2.74 vs 8.50±2.53 points). Furthermore, the DL-based NLP reports got slightly lower grades to those of junior ophthalmologists in lesion description (8.82±1.84 vs 9.12±1.20 points, p<0.05), correctness of diagnosis (8.72±2.36 vs 9.08±1.55 points, p<0.05) and recommendations (8.81±2.52 vs 9.15±1.65 points, p<0.05). For readability, the DL-based reports performed better than junior ophthalmologists, with scores of 9.98±0.17 vs 9.94±0.25 points (p=0.094).
The multimodal AI system, coupled with the NLP algorithm, has demonstrated competence in generating reports for four macular diseases compared with junior ophthalmologists.
研究基于规则和深度学习(DL)的方法,用于自动生成黄斑疾病的自然语言诊断报告。
这项诊断研究收集了1303例患者2261只眼睛的眼科图像。获取了彩色眼底照片和光学相干断层扫描图像。纳入了无视网膜疾病的眼睛以及诊断为四种黄斑疾病的眼睛。对于每只眼睛,撰写了一份诊断报告,格式包括病变描述、诊断和建议。随后,开发了基于规则的自然语言处理(NLP)和基于DL的NLP系统来自动生成诊断报告。为了评估这些模型的有效性,两名初级眼科医生独立为收集的图像撰写诊断报告。设计了一份问卷,并由两名视网膜专家进行评判,以对每份报告的可读性、诊断正确性、病变描述和建议进行评分。
基于规则的NLP报告在诊断正确性(9.13±1.52分对9.03±1.42分)和建议(8.55±2.74分对8.50±2.53分)方面比初级眼科医生的评分更高。此外,基于DL的NLP报告在病变描述(8.82±1.84分对9.12±1.20分,p<0.05)、诊断正确性(8.72±2.36分对9.08±1.55分,p<0.05)和建议(8.81±2.52分对9.15±1.65分,p<0.05)方面比初级眼科医生的评分略低。在可读性方面,基于DL的报告表现优于初级眼科医生,得分为9.98±0.17分对9.94±0.25分(p=0.094)。
与初级眼科医生相比,结合NLP算法的多模态人工智能系统在生成四种黄斑疾病的报告方面已显示出能力。