Giannuzzi Federico, Carlà Matteo Mario, Hu Lorenzo, Cestrone Valentina, Caputo Carmela Grazia, Sammarco Maria Grazia, Savino Gustavo, Rizzo Stanislao, Blasi Maria Antonietta, Pagliara Monica Maria
Ophthalmology Department, "Fondazione Policlinico Universitario A. Gemelli, IRCCS", 00168, Rome, Italy.
Ophthalmology Department, Catholic University "Sacro Cuore", Largo A. Gemelli, 8, Rome, Italy.
Int Ophthalmol. 2025 Feb 7;45(1):59. doi: 10.1007/s10792-024-03399-w.
To evaluate ChatGPT's ability to analyze comprehensive case descriptions of patients with uveal melanoma and provide recommendations for the most appropriate management.
Retrospective analysis of ocular oncology patients' medical records.
Forty patients treated for uveal melanoma between May 2019 and October 2023.
We uploaded each case description into the ChatGPT interface (version 4.0) and asked the model to provide realistic treatment options by asking the question, "What type of treatment do you recommend?" The accuracy of decisions produced by ChatGPT was compared to those recorded in patients' files and the treatment recommendations provided by three ocular oncologists, each with more than 10 years of experience.
The primary objective of this research was to assess the accuracy of ChatGPT replies in ocular oncology cases, analyzing its competence in both straightforward and intricate situations. Our secondary purpose was to assess the concordance between the responses of ChatGPT and those of ocular oncology specialists when faced with analogous clinical scenarios.
ChatGPT's surgical choices matched those in patients' files in 55% of cases (22 out of 40). ChatGPT options were agreed upon by 50%, 55%, and 57% of the three ocular oncology specialists. The investigation revealed significant differences between ChatGPT's responses and those of the three cancer specialists when compared to patients' files (p = 0.003, p = 0.001, and p = 0.001). ChatGPT's surgical responses matched with patient data in 18 out of 24 cases (75%), excluding enucleation cases. The decisions matched with the three ocular oncology specialists in 17/24, 18/24, and 18/24 cases, reflecting agreements of 70%, 75%, and 75%, respectively. The decisions made by ChatGPT were not significantly different from those of the three professionals in this cohort (p = 0.50, p = 0.36, and p = 0.36 for ChatGPT compared to specialists 1, 2, and 3).
ChatGPT exhibited a level of proficiency that was comparable to that of trained ocular oncology specialists. However, it exhibited its limitations when evaluating more complex scenarios, such as extrascleral extension or infiltration of the optic nerve, when a comprehensive evaluation of the patient is therefore necessary.
评估ChatGPT分析葡萄膜黑色素瘤患者综合病例描述并提供最合适治疗建议的能力。
对眼科肿瘤患者病历的回顾性分析。
2019年5月至2023年10月间接受葡萄膜黑色素瘤治疗的40例患者。
我们将每个病例描述上传至ChatGPT界面(版本4.0),并通过提问“你推荐哪种治疗方式?”让该模型提供实际的治疗方案。将ChatGPT给出的决策准确性与患者病历中记录的内容以及三位均有超过10年经验的眼科肿瘤专家提供的治疗建议进行比较。
本研究的主要目的是评估ChatGPT在眼科肿瘤病例中回复的准确性,分析其在简单和复杂情况下的能力。我们的次要目的是评估ChatGPT与眼科肿瘤专家在面对类似临床情况时回复的一致性。
ChatGPT的手术选择在55%的病例(40例中的22例)中与患者病历中的选择相符。三位眼科肿瘤专家分别有50%、55%和57%认同ChatGPT给出的方案。调查显示,与患者病历相比,ChatGPT的回复与三位癌症专家的回复之间存在显著差异(p = 0.003、p = 0.001和p = 0.001)。排除眼球摘除病例后,ChatGPT的手术回复在24例中的18例(75%)与患者数据相符。与三位眼科肿瘤专家的决策在17/24、18/24和18/24例中相符,分别反映出一致性为70%、75%和75%。在该队列中,ChatGPT做出的决策与三位专业人员的决策无显著差异(与专家1、2和3相比,ChatGPT的p值分别为0.50、0.36和0.36)。
ChatGPT展现出了与训练有素的眼科肿瘤专家相当的专业水平。然而,在评估更复杂的情况时,如巩膜外扩展或视神经浸润,它表现出了局限性,此时对患者进行全面评估是必要的。