Suppr超能文献

比较ChatGPT-3.5和ChatGPT-4与德国成人软组织肉瘤循证S3指南的一致性。

Comparing ChatGPT-3.5 and ChatGPT-4's alignments with the German evidence-based S3 guideline for adult soft tissue sarcoma.

作者信息

Li Cheng-Peng, Jakob Jens, Menge Franka, Reißfelder Christoph, Hohenberger Peter, Yang Cui

机构信息

Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Sarcoma Center, Peking University Cancer Hospital & Institute, Beijing, China.

Department of Surgery, University Medical Center Mannheim, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.

出版信息

iScience. 2024 Nov 28;27(12):111493. doi: 10.1016/j.isci.2024.111493. eCollection 2024 Dec 20.

Abstract

Clinical reliability assessment of large language models is necessary due to their increasing use in healthcare. This study assessed the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions deducted from the German evidence-based S3 guideline for adult soft tissue sarcoma (STS). Reponses to 80 complex clinical questions covering diagnosis, treatment, and surveillance aspects were independently scored by two sarcoma experts for accuracy and adequacy. ChatGPT-4 outperformed ChatGPT-3.5 overall, with higher median scores in both accuracy (5.5 vs. 5.0) and adequacy (5.0 vs. 4.0). While both versions performed similarly on questions about retroperitoneal/visceral sarcoma and gastrointestinal stromal tumor (GIST)-specific treatment as well as questions about surveillance, ChatGPT-4 performed better on questions about general STS treatment and extremity/trunk sarcomas. Despite their potential as a supportive tool, both models occasionally offered misleading and potentially life-threatening information. This underscores the significance of cautious adoption and human monitoring in clinical settings.

摘要

由于大语言模型在医疗保健领域的使用日益增加,对其进行临床可靠性评估很有必要。本研究评估了ChatGPT-3.5和ChatGPT-4在回答从德国成人软组织肉瘤(STS)循证S3指南中摘录的问题时的表现。两位肉瘤专家对80个涵盖诊断、治疗和监测方面的复杂临床问题的回答进行了准确性和充分性的独立评分。ChatGPT-4总体表现优于ChatGPT-3.5,在准确性(中位数5.5对5.0)和充分性(中位数5.0对4.0)方面得分更高。虽然两个版本在关于腹膜后/内脏肉瘤和胃肠道间质瘤(GIST)特异性治疗的问题以及监测问题上表现相似,但ChatGPT-4在关于一般STS治疗和四肢/躯干肉瘤的问题上表现更好。尽管它们有作为辅助工具的潜力,但两个模型偶尔都会提供误导性和潜在危及生命的信息。这凸显了在临床环境中谨慎采用和人工监测的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31ff/11699281/d29472dab53e/fx1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验