人工智能 ChatGPT 在为患者提供眼科疾病信息和管理方面的可靠性和准确性。
Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients.
机构信息
Retina Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA.
Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA.
出版信息
Eye (Lond). 2024 May;38(7):1368-1373. doi: 10.1038/s41433-023-02906-0. Epub 2024 Jan 20.
PURPOSE
To assess the accuracy of ophthalmic information provided by an artificial intelligence chatbot (ChatGPT).
METHODS
Five diseases from 8 subspecialties of Ophthalmology were assessed by ChatGPT version 3.5. Three questions were asked to ChatGPT for each disease: what is x?; how is x diagnosed?; how is x treated? (x = name of the disease). Responses were graded by comparing them to the American Academy of Ophthalmology (AAO) guidelines for patients, with scores ranging from -3 (unvalidated and potentially harmful to a patient's health or well-being if they pursue such a suggestion) to 2 (correct and complete).
MAIN OUTCOMES
Accuracy of responses from ChatGPT in response to prompts related to ophthalmic health information in the form of scores on a scale from -3 to 2.
RESULTS
Of the 120 questions, 93 (77.5%) scored ≥ 1. 27. (22.5%) scored ≤ -1; among these, 9 (7.5%) obtained a score of -3. The overall median score amongst all subspecialties was 2 for the question "What is x", 1.5 for "How is x diagnosed", and 1 for "How is x treated", though this did not achieve significance by Kruskal-Wallis testing.
CONCLUSIONS
Despite the positive scores, ChatGPT on its own still provides incomplete, incorrect, and potentially harmful information about common ophthalmic conditions, defined as the recommendation of invasive procedures or other interventions with potential for adverse sequelae which are not supported by the AAO for the disease in question. ChatGPT may be a valuable adjunct to patient education, but currently, it is not sufficient without concomitant human medical supervision.
目的
评估人工智能聊天机器人(ChatGPT)提供的眼科信息的准确性。
方法
评估 ChatGPT 版本 3.5 对眼科 8 个亚专科的 5 种疾病。针对每种疾病向 ChatGPT 询问 3 个问题:x 是什么;如何诊断 x;如何治疗 x(x=疾病名称)。通过与美国眼科学会(AAO)患者指南进行比较,对回复进行评分,评分范围从-3(未经验证,如果患者遵循此类建议,可能对患者的健康或福祉有害)到 2(正确且完整)。
主要结果
根据评分范围从-3 到 2 的量表,评估 ChatGPT 对与眼科健康信息相关的提示的回复的准确性。
结果
在 120 个问题中,93 个(77.5%)的得分≥1。27.(22.5%)得分≤-1;其中,9 个(7.5%)的得分是-3。所有亚专科的总体中位数得分分别为:“x 是什么”的问题为 2,“如何诊断 x”为 1.5,“如何治疗 x”为 1,但这在 Kruskal-Wallis 检验中没有达到显著水平。
结论
尽管得分较高,但 ChatGPT 本身提供的关于常见眼科疾病的信息仍然不完整、不正确,且可能有害,即推荐具有潜在不良后果的侵入性程序或其他干预措施,而这些疾病不被 AAO 所支持。ChatGPT 可能是患者教育的有价值的辅助手段,但目前,如果没有人类医疗监督,它还不够充分。