Suppr超能文献

结直肠癌预防:生成式预训练变换器聊天机器人(Chat GPT)是否准备好协助医生确定适当的筛查和监测建议?

Colorectal Cancer Prevention: Is Chat Generative Pretrained Transformer (Chat GPT) ready to Assist Physicians in Determining Appropriate Screening and Surveillance Recommendations?

作者信息

Pereyra Lisandro, Schlottmann Francisco, Steinberg Leandro, Lasa Juan

机构信息

Department of Gastroenterology.

Endoscopy Unit, Department of Surgery.

出版信息

J Clin Gastroenterol. 2024;58(10):1022-1027. doi: 10.1097/MCG.0000000000001979. Epub 2024 Feb 7.

Abstract

OBJECTIVE

To determine whether a publicly available advanced language model could help determine appropriate colorectal cancer (CRC) screening and surveillance recommendations.

BACKGROUND

Poor physician knowledge or inability to accurately recall recommendations might affect adherence to CRC screening guidelines. Adoption of newer technologies can help improve the delivery of such preventive care services.

METHODS

An assessment with 10 multiple choice questions, including 5 CRC screening and 5 CRC surveillance clinical vignettes, was inputted into chat generative pretrained transformer (ChatGPT) 3.5 in 4 separate sessions. Responses were recorded and screened for accuracy to determine the reliability of this tool. The mean number of correct answers was then compared against a control group of gastroenterologists and colorectal surgeons answering the same questions with and without the help of a previously validated CRC screening mobile app.

RESULTS

The average overall performance of ChatGPT was 45%. The mean number of correct answers was 2.75 (95% CI: 2.26-3.24), 1.75 (95% CI: 1.26-2.24), and 4.5 (95% CI: 3.93-5.07) for screening, surveillance, and total questions, respectively. ChatGPT showed inconsistency and gave a different answer in 4 questions among the different sessions. A total of 238 physicians also responded to the assessment; 123 (51.7%) without and 115 (48.3%) with the mobile app. The mean number of total correct answers of ChatGPT was significantly lower than those of physicians without [5.62 (95% CI: 5.32-5.92)] and with the mobile app [7.71 (95% CI: 7.39-8.03); P < 0.001].

CONCLUSIONS

Large language models developed with artificial intelligence require further refinements to serve as reliable assistants in clinical practice.

摘要

目的

确定一个公开可用的先进语言模型是否有助于确定合适的结直肠癌(CRC)筛查和监测建议。

背景

医生知识欠缺或无法准确回忆起相关建议可能会影响对CRC筛查指南的遵循。采用新技术有助于改善此类预防性医疗服务的提供。

方法

在4个独立的环节中,将包含10个多项选择题的评估内容(包括5个CRC筛查和5个CRC监测临床案例)输入到聊天生成预训练变换器(ChatGPT)3.5中。记录回答并筛选其准确性,以确定该工具的可靠性。然后将正确答案的平均数量与一组胃肠病学家和结直肠外科医生组成的对照组进行比较,该对照组在有和没有先前经验证的CRC筛查移动应用程序帮助的情况下回答相同的问题。

结果

ChatGPT的平均总体表现为45%。筛查、监测和总问题的正确答案平均数量分别为2.75(95%置信区间:2.26 - 3.24)、1.75(95%置信区间:1.26 - 2.24)和4.5(95%置信区间:3.93 - 5.07)。ChatGPT表现出不一致性,在不同环节中有4个问题给出了不同答案。共有238名医生也对该评估做出了回应;其中123名(51.7%)没有使用移动应用程序,115名(48.3%)使用了移动应用程序。ChatGPT正确答案的总平均数量显著低于未使用移动应用程序的医生[5.62(95%置信区间:5.32 - 5.92)]和使用移动应用程序的医生[7.71(95%置信区间:7.39 - 8.03);P < 0.001]。

结论

利用人工智能开发的大语言模型需要进一步完善,才能在临床实践中作为可靠的助手。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验