比较语言模型和人类专家在应用欧洲炎症性肠病和恶性肿瘤管理指南方面的效果。

Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies.

机构信息

Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel.

Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel.

出版信息

Endoscopy. 2024 Sep;56(9):706-709. doi: 10.1055/a-2289-5732. Epub 2024 Mar 18.

DOI:10.1055/a-2289-5732

PMID:38499197

Abstract

BACKGROUND

Society guidelines on colorectal dysplasia screening, surveillance, and endoscopic management in inflammatory bowel disease (IBD) are complex, and physician adherence to them is suboptimal. We aimed to evaluate the use of ChatGPT, a large language model, in generating accurate guideline-based recommendations for colorectal dysplasia screening, surveillance, and endoscopic management in IBD in line with European Crohn's and Colitis Organization (ECCO) guidelines.

METHODS

30 clinical scenarios in the form of free text were prepared and presented to three separate sessions of ChatGPT and to eight gastroenterologists (four IBD specialists and four non-IBD gastroenterologists). Two additional IBD specialists subsequently assessed all responses provided by ChatGPT and the eight gastroenterologists, judging their accuracy according to ECCO guidelines.

RESULTS

ChatGPT had a mean correct response rate of 87.8%. Among the eight gastroenterologists, the mean correct response rates were 85.8% for IBD experts and 89.2% for non-IBD experts. No statistically significant differences in accuracy were observed between ChatGPT and all gastroenterologists (=0.95), or between ChatGPT and the IBD experts and non-IBD expert gastroenterologists, respectively (=0.82).

CONCLUSIONS

This study highlights the potential of language models in enhancing guideline adherence regarding colorectal dysplasia in IBD. Further investigation of additional resources and prospective evaluation in real-world settings are warranted.

摘要

背景

结直肠异型增生的筛查、监测和内镜管理的社会指南在炎症性肠病（IBD）中较为复杂，且医生对其的遵从性较差。我们旨在评估使用大型语言模型 ChatGPT 生成符合欧洲克罗恩病和结肠炎组织（ECCO）指南的结直肠异型增生筛查、监测和内镜管理的基于指南的准确建议的应用。

方法

以自由文本的形式准备了 30 个临床场景，并将其呈现给 ChatGPT 的三个不同会话和 8 名胃肠病学家（4 名 IBD 专家和 4 名非 IBD 胃肠病学家）。随后，另外两名 IBD 专家对 ChatGPT 和 8 名胃肠病学家提供的所有回复进行了评估，根据 ECCO 指南判断其准确性。

结果

ChatGPT 的正确响应率平均为 87.8%。在 8 名胃肠病学家中，IBD 专家的平均正确响应率为 85.8%，非 IBD 专家的平均正确响应率为 89.2%。ChatGPT 与所有胃肠病学家之间（=0.95）或与 IBD 专家和非 IBD 专家胃肠病学家之间（=0.82）的准确性无统计学差异。

结论

本研究强调了语言模型在增强 IBD 中结直肠异型增生的指南遵从性方面的潜力。需要进一步研究其他资源，并在实际环境中进行前瞻性评估。

相似文献

Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies.比较语言模型和人类专家在应用欧洲炎症性肠病和恶性肿瘤管理指南方面的效果。

Endoscopy. 2024 Sep;56(9):706-709. doi: 10.1055/a-2289-5732. Epub 2024 Mar 18.

Knowledge and predictors of dysplasia surveillance performance in inflammatory bowel diseases in Australia.澳大利亚炎症性肠病患者中对异型增生监测表现的知识和预测因素。

Gastrointest Endosc. 2015 Oct;82(4):708-714.e4. doi: 10.1016/j.gie.2015.04.004. Epub 2015 May 23.

Adherence to ECCO Guidelines for Management of Iron Deficiency and Anemia in Inflammatory Bowel Diseases Among Israeli Adult and Pediatric Gastroenterologists.以色列成人和儿科胃肠病学家对炎症性肠病患者缺铁和贫血管理的 ECCO 指南的依从性。

J Pediatr Gastroenterol Nutr. 2023 Nov 1;77(5):634-639. doi: 10.1097/MPG.0000000000003913. Epub 2023 Aug 15.

Survey of barriers to adherence to international inflammatory bowel disease guidelines: does gastroenterologists' confidence translate to high adherence?国际炎症性肠病指南遵循障碍调查：胃肠病学家的信心是否转化为高遵循率？

Intern Med J. 2022 Aug;52(8):1330-1338. doi: 10.1111/imj.15299. Epub 2022 May 31.

Adherence of gastroenterologists to European Crohn's and Colitis Organisation consensus on Crohn's disease: a real-life survey in Spain.西班牙的一项真实生活调查：胃肠病学家对欧洲克罗恩病和结肠炎组织共识中克罗恩病的依从性。

J Crohns Colitis. 2012 Aug;6(7):763-70. doi: 10.1016/j.crohns.2011.12.013. Epub 2012 Jan 27.

Accuracy of Information given by ChatGPT for Patients with Inflammatory Bowel Disease in Relation to ECCO Guidelines.ChatGPT 为炎症性肠病患者提供的信息与 ECCO 指南的准确性比较。

J Crohns Colitis. 2024 Aug 14;18(8):1215-1221. doi: 10.1093/ecco-jcc/jjae040.

Optimizing the quality of endoscopy in inflammatory bowel disease: focus on surveillance and management of colorectal dysplasia using interactive image- and video-based teaching.优化炎症性肠病内镜质量：关注基于交互式图像和视频的教学在结直肠异型增生的监测和管理中的应用。

Gastrointest Endosc. 2017 Dec;86(6):1107-1117.e1. doi: 10.1016/j.gie.2017.07.045. Epub 2017 Aug 15.

Clinicians' adherence to international guidelines in the clinical care of adults with inflammatory bowel disease.临床医生在成人炎症性肠病临床护理中对国际指南的遵循情况。

Scand J Gastroenterol. 2017 May;52(5):536-542. doi: 10.1080/00365521.2017.1278785. Epub 2017 Jan 27.

Identifying the real-world challenges of dysplasia surveillance in inflammatory bowel disease: a retrospective cohort study in a tertiary health network.识别炎症性肠病异型增生监测的真实世界挑战：三级医疗网络中的回顾性队列研究。

Intern Med J. 2024 Jan;54(1):96-103. doi: 10.1111/imj.16102. Epub 2023 May 6.

Why don't gastroenterologists follow colon polyp surveillance guidelines?: results of a national survey.胃肠病学家为何不遵循结肠息肉监测指南？一项全国性调查结果

J Clin Gastroenterol. 2009 Jul;43(6):554-8. doi: 10.1097/MCG.0b013e31818242ad.

引用本文的文献

Large language models for clinical decision support in gastroenterology and hepatology.用于胃肠病学和肝病学临床决策支持的大语言模型

Nat Rev Gastroenterol Hepatol. 2025 Aug 22. doi: 10.1038/s41575-025-01108-1.

Assessing ChatGPT-v4 for Guideline-Concordant Inflammatory Bowel Disease: Accuracy, Completeness, and Temporal Drift.评估ChatGPT-v4在符合指南的炎症性肠病方面的表现：准确性、完整性和时间漂移

J Clin Med. 2025 Jun 29;14(13):4599. doi: 10.3390/jcm14134599.

Between hype and hard evidence: Are large language models ready for implementation in surveillance colonoscopy?在炒作与确凿证据之间：大语言模型准备好用于监测结肠镜检查了吗？

Endosc Int Open. 2025 Jun 17;13:a26047345. doi: 10.1055/a-2604-7345. eCollection 2025.

Large language models for disease diagnosis: a scoping review.用于疾病诊断的大语言模型：一项范围综述。

NPJ Artif Intell. 2025;1(1):9. doi: 10.1038/s44387-025-00011-z. Epub 2025 Jun 9.

Enhancing diagnostics: ChatGPT-4 performance in ulcerative colitis endoscopic assessment.增强诊断能力：ChatGPT-4在溃疡性结肠炎内镜评估中的表现

Endosc Int Open. 2025 Mar 14;13:a25420943. doi: 10.1055/a-2542-0943. eCollection 2025.

The global research of artificial intelligence on inflammatory bowel disease: A bibliometric analysis.人工智能对炎症性肠病的全球研究：一项文献计量分析。

Digit Health. 2025 Mar 14;11:20552076251326217. doi: 10.1177/20552076251326217. eCollection 2025 Jan-Dec.

Examining the Accuracy and Reproducibility of Responses to Nutrition Questions Related to Inflammatory Bowel Disease by Generative Pre-trained Transformer-4.通过生成式预训练变换器-4检验与炎症性肠病相关营养问题回答的准确性和可重复性。

Crohns Colitis 360. 2025 Feb 19;7(1):otae077. doi: 10.1093/crocol/otae077. eCollection 2025 Jan.

Inter-Rater Disagreements in Applying the Montreal Classification for Crohn's Disease: The Five-Nations Survey Study.应用蒙特利尔克罗恩病分类法时的评估者间分歧：五国调查研究

United European Gastroenterol J. 2025 Jun;13(5):685-696. doi: 10.1002/ueg2.12757. Epub 2025 Jan 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

比较语言模型和人类专家在应用欧洲炎症性肠病和恶性肿瘤管理指南方面的效果。

Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献