Schumacher Inès, Bühler Virginie Manuela Marie, Jaggi Damian, Roth Janice
Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland.
Moorfields Eye Hospital NHS Foundation Trust, City Road, EC1V 2, London, PD, UK.
Int J Retina Vitreous. 2024 Sep 11;10(1):63. doi: 10.1186/s40942-024-00581-1.
Uveitis is the ophthalmic subfield dealing with a broad range of intraocular inflammatory diseases. With the raising importance of LLM such as ChatGPT and their potential use in the medical field, this research explores the strengths and weaknesses of its applicability in the subfield of uveitis.
A series of highly clinically relevant questions were asked three consecutive times (attempts 1, 2 and 3) of the LLM regarding current uveitis cases. The answers were classified on whether they were accurate and sufficient, partially accurate and sufficient or inaccurate and insufficient. Statistical analysis included descriptive analysis, normality distribution, non-parametric test and reliability tests. References were checked for their correctness in different medical databases.
The data showed non-normal distribution. Data between subgroups (attempts 1, 2 and 3) was comparable (Kruskal-Wallis H test, p-value = 0.7338). There was a moderate agreement between attempt 1 and attempt 2 (Cohen's kappa, ĸ = 0.5172) as well as between attempt 2 and attempt 3 (Cohen's kappa, ĸ = 0.4913). There was a fair agreement between attempt 1 and attempt 3 (Cohen's kappa, ĸ = 0.3647). The average agreement was moderate (Cohen's kappa, ĸ = 0.4577). Between the three attempts together, there was a moderate agreement (Fleiss' kappa, ĸ = 0.4534). A total of 52 references were generated by the LLM. 22 references (42.3%) were found to be accurate and correctly cited. Another 22 references (42.3%) could not be located in any of the searched databases. The remaining 8 references (15.4%) were found to exist, but were either misinterpreted or incorrectly cited by the LLM.
Our results demonstrate the significant potential of LLMs in uveitis. However, their implementation requires rigorous training and comprehensive testing for specific medical tasks. We also found out that the references made by ChatGPT 4.o were in most cases incorrect. LLMs are likely to become invaluable tools in shaping the future of ophthalmology, enhancing clinical decision-making and patient care.
葡萄膜炎是眼科领域中涉及广泛的眼内炎症性疾病的分支。随着诸如ChatGPT等大语言模型(LLM)的重要性日益提高及其在医学领域的潜在应用,本研究探讨了其在葡萄膜炎领域适用性的优势与劣势。
针对当前葡萄膜炎病例,连续三次(尝试1、尝试2和尝试3)向大语言模型提出一系列高度临床相关的问题。根据答案是否准确充分、部分准确充分或不准确不充分进行分类。统计分析包括描述性分析、正态分布、非参数检验和可靠性检验。在不同医学数据库中检查参考文献的正确性。
数据呈非正态分布。亚组(尝试1、尝试2和尝试3)之间的数据具有可比性(Kruskal-Wallis H检验,p值 = 0.7338)。尝试1和尝试2之间存在中度一致性(Cohen's kappa,κ = 0.5172),尝试2和尝试3之间也存在中度一致性(Cohen's kappa,κ = 0.4913)。尝试1和尝试3之间存在尚可的一致性(Cohen's kappa,κ = 0.3647)。平均一致性为中度(Cohen's kappa,κ = 0.4577)。三次尝试之间总体存在中度一致性(Fleiss' kappa,κ = 0.4534)。大语言模型共生成了52条参考文献。22条参考文献(42.3%)被发现是准确且引用正确的。另外22条参考文献(42.3%)在任何搜索到的数据库中都未找到。其余8条参考文献(15.4%)被发现存在,但被大语言模型错误解读或引用错误。
我们的结果证明了大语言模型在葡萄膜炎方面具有巨大潜力。然而,它们的应用需要针对特定医学任务进行严格训练和全面测试。我们还发现,ChatGPT 4.0给出的参考文献在大多数情况下是不正确的。大语言模型很可能会成为塑造眼科未来、加强临床决策和患者护理的宝贵工具。