Royal Free London NHS Foundation Trust, London, United Kingdom.
Clarunis - University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland.
J Med Internet Res. 2023 Jun 30;25:e47479. doi: 10.2196/47479.
ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI.
We aimed to assess the reliability of medical information provided by ChatGPT.
Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT.
Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%.
ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information.
ChatGPT-4 是一款新型人工智能(AI)聊天机器人,能够自由回答复杂问题。在不久的将来,ChatGPT 可能会成为医疗保健专业人员和患者获取医疗信息的新标准。然而,人们对 AI 提供的医疗信息质量知之甚少。
评估 ChatGPT 提供的医疗信息的可靠性。
使用 Ensuring Quality Information for Patients(EQIP)工具评估 ChatGPT-4 提供的 5 种全球疾病负担最高的肝胆疾病(HPB)的医疗信息。EQIP 工具用于测量互联网上可用信息的质量,由 36 个项目组成,分为 3 个部分。此外,针对每个分析条件,重新表述了 5 项指南建议作为问题,并输入到 ChatGPT 中,由 2 位作者独立测量指南与 AI 答案之间的一致性。所有查询重复了 3 次,以测量 ChatGPT 的内部一致性。
确定了 5 种疾病(胆石病、胰腺炎、肝硬化、胰腺癌和肝细胞癌)。在 36 个项目的总分为 16 分(中位数为 14.5-18 分)。按部分划分,内容、识别和结构数据的中位数分数分别为 10 分(9.5-12.5 分)、1 分(1-1 分)和 4 分(4-5 分)。指南建议与 ChatGPT 提供的答案之间的一致性为 60%(15/25)。Fleiss κ 测量的组内一致性为 0.78(P<.001),表明高度一致。ChatGPT 提供的答案的内部一致性为 100%。
ChatGPT 提供的医疗信息与可用的静态互联网信息质量相当。尽管目前质量有限,但大型语言模型可能会成为患者和医疗保健专业人员获取医疗信息的未来标准。