Suppr超能文献

评估ChatGPT-v4在符合指南的炎症性肠病方面的表现:准确性、完整性和时间漂移

Assessing ChatGPT-v4 for Guideline-Concordant Inflammatory Bowel Disease: Accuracy, Completeness, and Temporal Drift.

作者信息

Ozturk Oguz, Ergul Mucahit, Cagir Yavuz, Atay Ali, Acun Kadir Can, Coskun Orhan, Tenlik Ilyas, Durak Muhammed Bahaddin, Yuksel Ilhami

机构信息

Department of Gastroenterology, Ankara Bilkent City Hospital, Ankara 06170, Turkey.

Department of Gastroenterology, Ankara Yildirim Beyazit University Yenimahalle Training and Research Hospital, Ankara 06560, Turkey.

出版信息

J Clin Med. 2025 Jun 29;14(13):4599. doi: 10.3390/jcm14134599.

Abstract

Chat Generative Pretrained Transformer (ChatGPT) is a useful resource for individuals working in the healthcare field. This paper will include descriptions of several ways in which ChatGPT-4 can achieve greater accuracy in its diagnosis and treatment plans for ulcerative colitis (UC) and Crohn's disease (CD) by following the guidelines set out by the European Crohn's and Colitis Organization (ECCO). The survey, which comprised 102 questions, was developed to assess the precision and consistency of respondents' responses regarding the UC and CD. The questionnaire incorporated true/false and multiple-choice questions, with the objective of simulating real-life scenarios and adhering to the ECCO guidelines. We employed Likert scales to assess the responses. The inquiries were put to ChatGPT-4 on the initial day, the 15th day, and the 180th day. The 51 true or false items demonstrated stability over a six-month period, with an initial accuracy of 92.8% at baseline, 92.8% on the 15th day, and peaked to 98.0% on the 180th day. This finding suggests a negligible effect size. The accuracy of the multiple-choice questions was initially 90.2% on Day 1, reached its highest point at 92.2% on Day 15, and then decreased to 84.3% on Day 180. However, the reliability of the data was found to be suboptimal, and the impact was deemed negligible. A modest, transient increase in performance was observed at 15 days, which subsequently diminished by 180 days, resulting in negligible effect sizes. ChatGPT-4 demonstrates potential as a clinical decision support system for UC and CD, but its assessment is marked by temporal variability and the inconsistent execution of various tasks. Essential initiatives that should be carried out before involving artificial intelligence (AI) technology in IBD trials are routine revalidation, multi-rater comparisons, prompt standardization, and the cultivation of a comprehensive understanding of the model's limitations.

摘要

聊天生成预训练变换器(ChatGPT)对医疗保健领域的从业者来说是一种有用的资源。本文将介绍ChatGPT-4通过遵循欧洲克罗恩病和结肠炎组织(ECCO)制定的指南,在溃疡性结肠炎(UC)和克罗恩病(CD)的诊断和治疗方案中实现更高准确性的几种方法。该调查包含102个问题,旨在评估受访者对UC和CD回答的准确性和一致性。问卷包括是非题和多项选择题,目的是模拟现实生活场景并遵循ECCO指南。我们使用李克特量表来评估回答。在第一天、第15天和第180天向ChatGPT-4提出这些问题。51道是非题在六个月内表现出稳定性,基线时的初始准确率为92.8%,第15天为92.8%,在第180天达到峰值98.0%。这一发现表明效应量可忽略不计。多项选择题的准确率在第1天最初为90.2%,在第15天达到最高点92.2%,然后在第180天降至84.3%。然而,发现数据的可靠性欠佳,影响可忽略不计。在第15天观察到性能有适度的短暂提升,随后在第180天下降,导致效应量可忽略不计。ChatGPT-4显示出作为UC和CD临床决策支持系统的潜力,但其评估存在时间变异性和各项任务执行不一致的问题。在将人工智能(AI)技术纳入炎症性肠病(IBD)试验之前应开展的重要举措包括定期重新验证、多评估者比较、提示标准化以及全面了解模型的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3bc8/12250039/8b41c46ef9c5/jcm-14-04599-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验