评估ChatGPT-v4在符合指南的炎症性肠病方面的表现：准确性、完整性和时间漂移

Assessing ChatGPT-v4 for Guideline-Concordant Inflammatory Bowel Disease: Accuracy, Completeness, and Temporal Drift.

作者信息

Ozturk Oguz, Ergul Mucahit, Cagir Yavuz, Atay Ali, Acun Kadir Can, Coskun Orhan, Tenlik Ilyas, Durak Muhammed Bahaddin, Yuksel Ilhami

机构信息

Department of Gastroenterology, Ankara Bilkent City Hospital, Ankara 06170, Turkey.

Department of Gastroenterology, Ankara Yildirim Beyazit University Yenimahalle Training and Research Hospital, Ankara 06560, Turkey.

出版信息

J Clin Med. 2025 Jun 29;14(13):4599. doi: 10.3390/jcm14134599.

DOI:10.3390/jcm14134599

PMID:40648973

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12250039/

Abstract

Chat Generative Pretrained Transformer (ChatGPT) is a useful resource for individuals working in the healthcare field. This paper will include descriptions of several ways in which ChatGPT-4 can achieve greater accuracy in its diagnosis and treatment plans for ulcerative colitis (UC) and Crohn's disease (CD) by following the guidelines set out by the European Crohn's and Colitis Organization (ECCO). The survey, which comprised 102 questions, was developed to assess the precision and consistency of respondents' responses regarding the UC and CD. The questionnaire incorporated true/false and multiple-choice questions, with the objective of simulating real-life scenarios and adhering to the ECCO guidelines. We employed Likert scales to assess the responses. The inquiries were put to ChatGPT-4 on the initial day, the 15th day, and the 180th day. The 51 true or false items demonstrated stability over a six-month period, with an initial accuracy of 92.8% at baseline, 92.8% on the 15th day, and peaked to 98.0% on the 180th day. This finding suggests a negligible effect size. The accuracy of the multiple-choice questions was initially 90.2% on Day 1, reached its highest point at 92.2% on Day 15, and then decreased to 84.3% on Day 180. However, the reliability of the data was found to be suboptimal, and the impact was deemed negligible. A modest, transient increase in performance was observed at 15 days, which subsequently diminished by 180 days, resulting in negligible effect sizes. ChatGPT-4 demonstrates potential as a clinical decision support system for UC and CD, but its assessment is marked by temporal variability and the inconsistent execution of various tasks. Essential initiatives that should be carried out before involving artificial intelligence (AI) technology in IBD trials are routine revalidation, multi-rater comparisons, prompt standardization, and the cultivation of a comprehensive understanding of the model's limitations.

摘要

聊天生成预训练变换器（ChatGPT）对医疗保健领域的从业者来说是一种有用的资源。本文将介绍ChatGPT-4通过遵循欧洲克罗恩病和结肠炎组织（ECCO）制定的指南，在溃疡性结肠炎（UC）和克罗恩病（CD）的诊断和治疗方案中实现更高准确性的几种方法。该调查包含102个问题，旨在评估受访者对UC和CD回答的准确性和一致性。问卷包括是非题和多项选择题，目的是模拟现实生活场景并遵循ECCO指南。我们使用李克特量表来评估回答。在第一天、第15天和第180天向ChatGPT-4提出这些问题。51道是非题在六个月内表现出稳定性，基线时的初始准确率为92.8%，第15天为92.8%，在第180天达到峰值98.0%。这一发现表明效应量可忽略不计。多项选择题的准确率在第1天最初为90.2%，在第15天达到最高点92.2%，然后在第180天降至84.3%。然而，发现数据的可靠性欠佳，影响可忽略不计。在第15天观察到性能有适度的短暂提升，随后在第180天下降，导致效应量可忽略不计。ChatGPT-4显示出作为UC和CD临床决策支持系统的潜力，但其评估存在时间变异性和各项任务执行不一致的问题。在将人工智能（AI）技术纳入炎症性肠病（IBD）试验之前应开展的重要举措包括定期重新验证、多评估者比较、提示标准化以及全面了解模型的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bc8/12250039/8b41c46ef9c5/jcm-14-04599-g001.jpg

相似文献

Assessing ChatGPT-v4 for Guideline-Concordant Inflammatory Bowel Disease: Accuracy, Completeness, and Temporal Drift.评估ChatGPT-v4在符合指南的炎症性肠病方面的表现：准确性、完整性和时间漂移

J Clin Med. 2025 Jun 29;14(13):4599. doi: 10.3390/jcm14134599.

Interventions for the management of abdominal pain in Crohn's disease and inflammatory bowel disease.干预措施用于克罗恩病和炎症性肠病的腹痛管理。

Cochrane Database Syst Rev. 2021 Nov 29;11(11):CD013531. doi: 10.1002/14651858.CD013531.pub2.

Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗：一项系统综述

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。

Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.

Patient education interventions for the management of inflammatory bowel disease.炎症性肠病管理的患者教育干预措施。

Cochrane Database Syst Rev. 2023 May 4;5(5):CD013854. doi: 10.1002/14651858.CD013854.pub2.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Sexual Harassment and Prevention Training性骚扰与预防培训

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

本文引用的文献

Applications of generative artificial intelligence in inflammatory bowel disease: A systematic review.生成式人工智能在炎症性肠病中的应用：一项系统综述。

Dig Liver Dis. 2025 May 9. doi: 10.1016/j.dld.2025.04.026.

Artificial intelligence in inflammatory bowel disease.炎症性肠病中的人工智能

Saudi J Gastroenterol. 2025 Jul 1;31(4):197-205. doi: 10.4103/sjg.sjg_46_25. Epub 2025 Apr 25.

Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease.ChatGPT在患者教育中替代医生的能力：炎症性肠病的横断面比较分析

J Med Internet Res. 2025 Mar 31;27:e62857. doi: 10.2196/62857.

The global research of artificial intelligence on inflammatory bowel disease: A bibliometric analysis.人工智能对炎症性肠病的全球研究：一项文献计量分析。

Digit Health. 2025 Mar 14;11:20552076251326217. doi: 10.1177/20552076251326217. eCollection 2025 Jan-Dec.

Evaluating large language models as patient education tools for inflammatory bowel disease: A comparative study.评估大型语言模型作为炎症性肠病患者教育工具的效果：一项比较研究。

World J Gastroenterol. 2025 Feb 14;31(6):102090. doi: 10.3748/wjg.v31.i6.102090.

Evaluation of online chat-based artificial intelligence responses about inflammatory bowel disease and diet.评估关于炎症性肠病和饮食的在线聊天式人工智能回复。

Eur J Gastroenterol Hepatol. 2024 Sep 1;36(9):1109-1112. doi: 10.1097/MEG.0000000000002815. Epub 2024 Jul 8.

Accuracy of Information given by ChatGPT for Patients with Inflammatory Bowel Disease in Relation to ECCO Guidelines.ChatGPT 为炎症性肠病患者提供的信息与 ECCO 指南的准确性比较。

J Crohns Colitis. 2024 Aug 14;18(8):1215-1221. doi: 10.1093/ecco-jcc/jjae040.

Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies.比较语言模型和人类专家在应用欧洲炎症性肠病和恶性肿瘤管理指南方面的效果。

Endoscopy. 2024 Sep;56(9):706-709. doi: 10.1055/a-2289-5732. Epub 2024 Mar 18.

Assessment of ChatGPT's adherence to ETA-thyroid nodule management guideline over two different time intervals 14 days apart: in binary and multiple-choice queries.评估 ChatGPT 在相隔 14 天的两个不同时间间隔内对 ETA-甲状腺结节管理指南的遵循情况：在二分类和多项选择查询中。

Endocrine. 2024 Aug;85(2):794-802. doi: 10.1007/s12020-024-03750-2. Epub 2024 Mar 15.

May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients' questions? An evidence-controlled analysis.ChatGPT 能否成为一种为常见炎症性肠病患者问题提供医疗信息的工具？一项基于证据的分析。

World J Gastroenterol. 2024 Jan 7;30(1):17-33. doi: 10.3748/wjg.v30.i1.17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT-v4在符合指南的炎症性肠病方面的表现：准确性、完整性和时间漂移

Assessing ChatGPT-v4 for Guideline-Concordant Inflammatory Bowel Disease: Accuracy, Completeness, and Temporal Drift.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献