Belenje Akash, Pandya Dhanush, Jalali Subhadra, Rani Padmaja K
Srimati Kanuri Santhamma Center for Vitreo-Retinal Diseases, Anant Bajaj Retina Institute, Kallam Anji Reddy Campus, L V Prasad Eye Institute, Hyderabad, IND.
Cureus. 2025 Feb 5;17(2):e78597. doi: 10.7759/cureus.78597. eCollection 2025 Feb.
Objective The aim of this study was to compare the accuracy of ChatGPT artificial intelligence (AI) with clinicians in real-life case scenarios related to retinopathy of prematurity (ROP). Methods This was a prospectively conducted study with a real-life case scenario-based questionnaire with multiple-response answers. Thirteen clinicians, including eight vitreoretinal fellowship trainees (with less than two years of experience in the management of ROP) and five ROP experts (with more than three years of experience in the management of ROP), were given 10 real-life case scenarios in ROP. The majority of responses from trainees and ROP experts were compared with the ChatGPT AI-generated responses. The ChatGPT exercise was repeated for both versions 3.5 and 4.0 more than a month apart on May 29, 2024, and July 18, 2024, to check for the majority of AI response consistency. For each real-life case scenario, the majority of clinician responses were compared with the majority of AI responses for agreement. Results ChatGPT answered nine cases correctly (90%), outperforming the fellowship trainees (77.5%, i.e., 62 correct responses out of 80). The accuracy of ROP experts was highest at 96% (i.e., 48 correct responses out of 50). There was substantial agreement between the majority of responses of clinicians and the ChatGPT responses, with a Cohen's kappa of 0.80. Conclusion The ChatGPT AI model showed substantial agreement with the majority of clinician responses and performed better than vitreoretinal fellowship trainees. ChatGPT AI presents promising new software tools that can be explored further for use in real-life case scenarios in ROP. A more accurate prompt mentioning the type of screening guidelines can promote more accurate answers by ChatGPT as per the requested guidelines.
目的 本研究旨在比较ChatGPT人工智能(AI)与临床医生在早产儿视网膜病变(ROP)真实病例场景中的准确性。方法 这是一项前瞻性研究,采用基于真实病例场景的多选项问卷。13名临床医生,包括8名玻璃体视网膜专科培训学员(在ROP管理方面经验少于两年)和5名ROP专家(在ROP管理方面经验超过三年),被给予10个ROP真实病例场景。将学员和ROP专家的大多数回答与ChatGPT生成的回答进行比较。于2024年5月29日和2024年7月18日,相隔一个多月对ChatGPT 3.5版和4.0版重复进行测试,以检查AI回答的一致性。对于每个真实病例场景,将临床医生的大多数回答与AI的大多数回答进行一致性比较。结果 ChatGPT正确回答了9个病例(90%),表现优于专科培训学员(77.5%,即80个回答中有62个正确)。ROP专家的准确率最高,为96%(即50个回答中有48个正确)。临床医生的大多数回答与ChatGPT的回答之间存在高度一致性,科恩kappa系数为0.80。结论 ChatGPT AI模型与临床医生的大多数回答显示出高度一致性,并且表现优于玻璃体视网膜专科培训学员。ChatGPT AI提供了有前景的新软件工具,可在ROP真实病例场景中进一步探索使用。更准确地提及筛查指南类型的提示可以促使ChatGPT根据所要求的指南给出更准确的回答。