人工智能 ChatGPT 在为患者提供眼科疾病信息和管理方面的可靠性和准确性。

Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients.

机构信息

Retina Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA.

Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA.

出版信息

Eye (Lond). 2024 May;38(7):1368-1373. doi: 10.1038/s41433-023-02906-0. Epub 2024 Jan 20.

DOI:10.1038/s41433-023-02906-0

PMID:38245622

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11076805/

Abstract

PURPOSE

To assess the accuracy of ophthalmic information provided by an artificial intelligence chatbot (ChatGPT).

METHODS

Five diseases from 8 subspecialties of Ophthalmology were assessed by ChatGPT version 3.5. Three questions were asked to ChatGPT for each disease: what is x?; how is x diagnosed?; how is x treated? (x = name of the disease). Responses were graded by comparing them to the American Academy of Ophthalmology (AAO) guidelines for patients, with scores ranging from -3 (unvalidated and potentially harmful to a patient's health or well-being if they pursue such a suggestion) to 2 (correct and complete).

MAIN OUTCOMES

Accuracy of responses from ChatGPT in response to prompts related to ophthalmic health information in the form of scores on a scale from -3 to 2.

RESULTS

Of the 120 questions, 93 (77.5%) scored ≥ 1. 27. (22.5%) scored ≤ -1; among these, 9 (7.5%) obtained a score of -3. The overall median score amongst all subspecialties was 2 for the question "What is x", 1.5 for "How is x diagnosed", and 1 for "How is x treated", though this did not achieve significance by Kruskal-Wallis testing.

CONCLUSIONS

Despite the positive scores, ChatGPT on its own still provides incomplete, incorrect, and potentially harmful information about common ophthalmic conditions, defined as the recommendation of invasive procedures or other interventions with potential for adverse sequelae which are not supported by the AAO for the disease in question. ChatGPT may be a valuable adjunct to patient education, but currently, it is not sufficient without concomitant human medical supervision.

摘要

目的

评估人工智能聊天机器人（ChatGPT）提供的眼科信息的准确性。

方法

评估 ChatGPT 版本 3.5 对眼科 8 个亚专科的 5 种疾病。针对每种疾病向 ChatGPT 询问 3 个问题：x 是什么；如何诊断 x；如何治疗 x（x=疾病名称）。通过与美国眼科学会（AAO）患者指南进行比较，对回复进行评分，评分范围从-3（未经验证，如果患者遵循此类建议，可能对患者的健康或福祉有害）到 2（正确且完整）。

主要结果

根据评分范围从-3 到 2 的量表，评估 ChatGPT 对与眼科健康信息相关的提示的回复的准确性。

结果

在 120 个问题中，93 个（77.5%）的得分≥1。27.（22.5%）得分≤-1；其中，9 个（7.5%）的得分是-3。所有亚专科的总体中位数得分分别为：“x 是什么”的问题为 2，“如何诊断 x”为 1.5，“如何治疗 x”为 1，但这在 Kruskal-Wallis 检验中没有达到显著水平。