University of Cambridge, Cambridge, UK.
Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
Br J Ophthalmol. 2024 Sep 20;108(10):1362-1370. doi: 10.1136/bjo-2023-324734.
Large language models (LLMs) are fast emerging as potent tools in healthcare, including ophthalmology. This systematic review offers a twofold contribution: it summarises current trends in ophthalmology-related LLM research and projects future directions for this burgeoning field.
We systematically searched across various databases (PubMed, Europe PMC, Scopus and Web of Science) for articles related to LLM use in ophthalmology, published between 1 January 2022 and 31 July 2023. Selected articles were summarised, and categorised by type (editorial, commentary, original research, etc) and their research focus (eg, evaluating ChatGPT's performance in ophthalmology examinations or clinical tasks).
We identified 32 articles meeting our criteria, published between January and July 2023, with a peak in June (n=12). Most were original research evaluating LLMs' proficiency in clinically related tasks (n=9). Studies demonstrated that ChatGPT-4.0 outperformed its predecessor, ChatGPT-3.5, in ophthalmology exams. Furthermore, ChatGPT excelled in constructing discharge notes (n=2), evaluating diagnoses (n=2) and answering general medical queries (n=6). However, it struggled with generating scientific articles or abstracts (n=3) and answering specific subdomain questions, especially those regarding specific treatment options (n=2). ChatGPT's performance relative to other LLMs (Google's Bard, Microsoft's Bing) varied by study design. Ethical concerns such as data hallucination (n=27), authorship (n=5) and data privacy (n=2) were frequently cited.
While LLMs hold transformative potential for healthcare and ophthalmology, concerns over accountability, accuracy and data security remain. Future research should focus on application programming interface integration, comparative assessments of popular LLMs, their ability to interpret image-based data and the establishment of standardised evaluation frameworks.
大型语言模型(LLM)在医疗保健领域,包括眼科领域,迅速成为强大的工具。本系统综述有两个贡献:总结眼科相关 LLM 研究的当前趋势,并为这个新兴领域规划未来方向。
我们系统地在多个数据库(PubMed、Europe PMC、Scopus 和 Web of Science)中搜索了 2022 年 1 月 1 日至 2023 年 7 月 31 日期间发表的与 LLM 在眼科中的使用相关的文章。选择的文章进行了总结,并按类型(社论、评论、原始研究等)和研究重点(例如,评估 ChatGPT 在眼科检查或临床任务中的表现)进行了分类。
我们确定了符合标准的 32 篇文章,发表时间为 2023 年 1 月至 7 月,6 月达到高峰(n=12)。大多数是评估 LLM 在临床相关任务中的熟练程度的原始研究(n=9)。研究表明,ChatGPT-4.0 在眼科考试中优于其前身 ChatGPT-3.5。此外,ChatGPT 在构建出院记录(n=2)、评估诊断(n=2)和回答一般医学查询(n=6)方面表现出色。然而,它在生成科学文章或摘要(n=3)以及回答特定子领域问题方面存在困难,特别是涉及特定治疗方案的问题(n=2)。ChatGPT 的性能相对于其他 LLM(谷歌的 Bard、微软的 Bing)因研究设计而异。数据幻觉(n=27)、作者身份(n=5)和数据隐私(n=2)等伦理问题经常被提及。
虽然 LLM 对医疗保健和眼科有变革性的潜力,但对问责制、准确性和数据安全的担忧仍然存在。未来的研究应侧重于应用程序编程接口集成、对流行 LLM 的比较评估、它们解释基于图像数据的能力以及建立标准化评估框架。