Suppr超能文献

ChatGPT 与卵巢癌管理的国家指南比较:ChatGPT 是否做对了?- 纪念斯隆凯特琳癌症中心卵巢癌团队研究。

ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study.

机构信息

Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA.

Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA.

出版信息

Gynecol Oncol. 2024 Oct;189:75-79. doi: 10.1016/j.ygyno.2024.07.007. Epub 2024 Jul 22.

Abstract

OBJECTIVES

We evaluated the performance of a chatbot compared to the National Comprehensive Cancer Network (NCCN) Guidelines for the management of ovarian cancer.

METHODS

Using NCCN Guidelines, we generated 10 questions and answers regarding management of ovarian cancer at a single point in time. Questions were thematically divided into risk factors, surgical management, medical management, and surveillance. We asked ChatGPT (GPT-4) to provide responses without prompting (unprompted GPT) and with prompt engineering (prompted GPT). Responses were blinded and evaluated for accuracy and completeness by 5 gynecologic oncologists. A score of 0 was defined as inaccurate, 1 as accurate and incomplete, and 2 as accurate and complete. Evaluations were compared among NCCN, unprompted GPT, and prompted GPT answers.

RESULTS

Overall, 48% of responses from NCCN, 64% from unprompted GPT, and 66% from prompted GPT were accurate and complete. The percentage of accurate but incomplete responses was higher for NCCN vs GPT-4. The percentage of accurate and complete scores for questions regarding risk factors, surgical management, and surveillance was higher for GPT-4 vs NCCN; however, for questions regarding medical management, the percentage was lower for GPT-4 vs NCCN. Overall, 14% of responses from unprompted GPT, 12% from prompted GPT, and 10% from NCCN were inaccurate.

CONCLUSIONS

GPT-4 provided accurate and complete responses at a single point in time to a limited set of questions regarding ovarian cancer, with best performance in areas of risk factors, surgical management, and surveillance. Occasional inaccuracies, however, should limit unsupervised use of chatbots at this time.

摘要

目的

我们评估了聊天机器人与美国国家综合癌症网络(NCCN)卵巢癌管理指南相比的性能。

方法

使用 NCCN 指南,我们针对卵巢癌管理在单一时间点生成了 10 个问题和答案。问题分为风险因素、手术管理、医学管理和监测。我们要求 ChatGPT(GPT-4)在没有提示(未提示 GPT)和提示工程(提示 GPT)的情况下提供回复。回复由 5 名妇科肿瘤学家进行盲法评估,以确定准确性和完整性。评分 0 定义为不准确,1 定义为准确但不完整,2 定义为准确且完整。将 NCCN、未提示 GPT 和提示 GPT 的答案进行了比较。

结果

总体而言,NCCN 的 48%、未提示 GPT 的 64%和提示 GPT 的 66%的回复是准确且完整的。与 GPT-4 相比,NCCN 的准确但不完整回复的百分比更高。关于风险因素、手术管理和监测的问题,GPT-4 的准确且完整评分的百分比高于 NCCN;然而,对于医学管理问题,GPT-4 的百分比低于 NCCN。总体而言,未提示 GPT 的 14%、提示 GPT 的 12%和 NCCN 的 10%的回复是不准确的。

结论

GPT-4 在单一时间点针对有限数量的卵巢癌问题提供了准确且完整的回复,在风险因素、手术管理和监测方面表现最佳。然而,偶尔的不准确应该限制此时对聊天机器人的无监督使用。

相似文献

本文引用的文献

1
GPT-4 passes the bar exam.GPT-4通过了律师资格考试。
Philos Trans A Math Phys Eng Sci. 2024 Apr 15;382(2270):20230254. doi: 10.1098/rsta.2023.0254. Epub 2024 Feb 26.
4
Peer review of GPT-4 technical report and systems card.GPT-4技术报告和系统卡片的同行评审。
PLOS Digit Health. 2024 Jan 18;3(1):e0000417. doi: 10.1371/journal.pdig.0000417. eCollection 2024 Jan.
5
Cancer statistics, 2024.2024年癌症统计数据。
CA Cancer J Clin. 2024 Jan-Feb;74(1):12-49. doi: 10.3322/caac.21820. Epub 2024 Jan 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验