Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel.
Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel.
Endoscopy. 2024 Sep;56(9):706-709. doi: 10.1055/a-2289-5732. Epub 2024 Mar 18.
Society guidelines on colorectal dysplasia screening, surveillance, and endoscopic management in inflammatory bowel disease (IBD) are complex, and physician adherence to them is suboptimal. We aimed to evaluate the use of ChatGPT, a large language model, in generating accurate guideline-based recommendations for colorectal dysplasia screening, surveillance, and endoscopic management in IBD in line with European Crohn's and Colitis Organization (ECCO) guidelines.
30 clinical scenarios in the form of free text were prepared and presented to three separate sessions of ChatGPT and to eight gastroenterologists (four IBD specialists and four non-IBD gastroenterologists). Two additional IBD specialists subsequently assessed all responses provided by ChatGPT and the eight gastroenterologists, judging their accuracy according to ECCO guidelines.
ChatGPT had a mean correct response rate of 87.8%. Among the eight gastroenterologists, the mean correct response rates were 85.8% for IBD experts and 89.2% for non-IBD experts. No statistically significant differences in accuracy were observed between ChatGPT and all gastroenterologists (=0.95), or between ChatGPT and the IBD experts and non-IBD expert gastroenterologists, respectively (=0.82).
This study highlights the potential of language models in enhancing guideline adherence regarding colorectal dysplasia in IBD. Further investigation of additional resources and prospective evaluation in real-world settings are warranted.
结直肠异型增生的筛查、监测和内镜管理的社会指南在炎症性肠病(IBD)中较为复杂,且医生对其的遵从性较差。我们旨在评估使用大型语言模型 ChatGPT 生成符合欧洲克罗恩病和结肠炎组织(ECCO)指南的结直肠异型增生筛查、监测和内镜管理的基于指南的准确建议的应用。
以自由文本的形式准备了 30 个临床场景,并将其呈现给 ChatGPT 的三个不同会话和 8 名胃肠病学家(4 名 IBD 专家和 4 名非 IBD 胃肠病学家)。随后,另外两名 IBD 专家对 ChatGPT 和 8 名胃肠病学家提供的所有回复进行了评估,根据 ECCO 指南判断其准确性。
ChatGPT 的正确响应率平均为 87.8%。在 8 名胃肠病学家中,IBD 专家的平均正确响应率为 85.8%,非 IBD 专家的平均正确响应率为 89.2%。ChatGPT 与所有胃肠病学家之间(=0.95)或与 IBD 专家和非 IBD 专家胃肠病学家之间(=0.82)的准确性无统计学差异。
本研究强调了语言模型在增强 IBD 中结直肠异型增生的指南遵从性方面的潜力。需要进一步研究其他资源,并在实际环境中进行前瞻性评估。