Lahat Adi, Shachar Eyal, Avidan Benjamin, Glicksberg Benjamin, Klang Eyal
Chaim Sheba Medical Center, Department of Gastroenterology, Affiliated to Tel Aviv University, Tel Aviv 69978, Israel.
Mount Sinai Clinical Intelligence Center, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Diagnostics (Basel). 2023 Jun 2;13(11):1950. doi: 10.3390/diagnostics13111950.
Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI's ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients' questions regarding gastrointestinal health.
To evaluate the performance of ChatGPT in answering patients' questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed.
ChatGPT was able to provide accurate and clear answers to patients' questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively.
While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT.
患者常常对自身疾病感到担忧,且发现获取准确信息颇具挑战性。OpenAI的ChatGPT聊天机器人(ChatGPT)是一款新开发的大型语言模型,旨在回答各个领域的广泛问题。我们的目的是评估ChatGPT在回答患者有关胃肠道健康问题方面的表现。
为评估ChatGPT回答患者问题的表现,我们使用了110个现实生活问题的代表性样本。由三位经验丰富的胃肠病学家共同对ChatGPT给出的答案进行评分。评估ChatGPT给出答案的准确性、清晰度和有效性。
ChatGPT在某些情况下能够为患者的问题提供准确清晰的答案,但在其他情况下则不然。对于治疗相关问题,平均准确性、清晰度和有效性得分(1至5分)分别为3.9±0.8、3.9±0.9和3.3±0.9。对于症状相关问题,平均准确性、清晰度和有效性得分分别为3.4±0.8、3.7±0.7和3.2±0.7。对于诊断测试相关问题,平均准确性、清晰度和有效性得分分别为3.7±1.7、3.7±1.8和3.5±1.7。
虽然ChatGPT有作为信息来源的潜力,但仍需进一步发展。信息质量取决于所提供在线信息的质量。这些发现可能对医疗服务提供者和患者理解ChatGPT的能力与局限性都有用处。