评估 ChatGPT 在解答与幽门螺杆菌感染相关问题方面的准确性：一项全国性调查和对比研究。

Assessing Accuracy of ChatGPT on Addressing Helicobacter pylori Infection-Related Questions: A National Survey and Comparative Study.

机构信息

Department of Gastroenterology, Jiangxi Medical College, The First Affiliated Hospital, Digestive Disease Hospital, Nanchang University, Nanchang, Jiangxi, China.

Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China.

出版信息

Helicobacter. 2024 Jul-Aug;29(4):e13116. doi: 10.1111/hel.13116.

DOI:10.1111/hel.13116

PMID:39080910

Abstract

BACKGROUND

ChatGPT is a novel and online large-scale language model used as a source providing up-to-date and useful health-related knowledges to patients and clinicians. However, its performance on Helicobacter pylori infection-related questions remain unknown. This study aimed to evaluate the accuracy of ChatGPT's responses on H. pylori-related questions compared with that of gastroenterologists during the same period.

METHODS

Twenty-five H. pylori-related questions from five domains: Indication, Diagnostics, Treatment, Gastric cancer and prevention, and Gut Microbiota were selected based on the Maastricht VI Consensus report. Each question was tested three times with ChatGPT3.5 and ChatGPT4. Two independent H. pylori experts assessed the responses from ChatGPT, with discrepancies resolved by a third reviewer. Simultaneously, a nationwide survey with the same questions was conducted among 1279 gastroenterologists and 154 medical students. The accuracy of responses from ChatGPT3.5 and ChatGPT4 was compared with that of gastroenterologists.

RESULTS

Overall, both ChatGPT3.5 and ChatGPT4 demonstrated high accuracy, with median accuracy rates of 92% for each of the three responses, surpassing the accuracy of nationwide gastroenterologists (median: 80%) and equivalent to that of senior gastroenterologists. Compared with ChatGPT3.5, ChatGPT4 provided more concise responses with the same accuracy. ChatGPT3.5 performed well in the Indication, Treatment, and Gut Microbiota domains, whereas ChatGPT4 excelled in Diagnostics, Gastric cancer and prevention, and Gut Microbiota domains.

CONCLUSION

ChatGPT exhibited high accuracy and reproducibility in addressing H. pylori-related questions except the decision for H. pylori treatment, performing at the level of senior gastroenterologists and could serve as an auxiliary information tool for assisting patients and clinicians.

摘要

背景

ChatGPT 是一种新型的在线大型语言模型，可作为资源为患者和临床医生提供最新且有用的健康相关知识。然而，其在幽门螺杆菌感染相关问题上的表现尚不清楚。本研究旨在评估 ChatGPT 在幽门螺杆菌相关问题上的回答准确性，与同期的胃肠病学家相比。

方法

根据马斯特里赫特 VI 共识报告，从指示、诊断、治疗、胃癌预防和肠道微生物群五个领域中选择了 25 个幽门螺杆菌相关问题。每个问题用 ChatGPT3.5 和 ChatGPT4 测试三次。两名独立的幽门螺杆菌专家评估 ChatGPT 的回复，有分歧的由第三名评审解决。同时，在全国范围内对 1279 名胃肠病学家和 154 名医学生进行了相同问题的调查。比较了 ChatGPT3.5 和 ChatGPT4 的回复准确性与胃肠病学家的回复准确性。

结果

总体而言，ChatGPT3.5 和 ChatGPT4 均表现出很高的准确性，每个问题的三个回复的准确率中位数均为 92%，高于全国胃肠病学家的准确率（中位数：80%），与高级胃肠病学家相当。与 ChatGPT3.5 相比，ChatGPT4 以相同的准确性提供了更简洁的回复。ChatGPT3.5 在指示、治疗和肠道微生物群领域表现良好，而 ChatGPT4 在诊断、胃癌预防和肠道微生物群领域表现出色。