Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada.
Ross University School of Medicine, Miramar, FL, USA.
Surg Endosc. 2024 Oct;38(10):5668-5677. doi: 10.1007/s00464-024-11155-5. Epub 2024 Aug 12.
Large Language Models (LLMs) provide clinical guidance with inconsistent accuracy due to limitations with their training dataset. LLMs are "teachable" through customization. We compared the ability of the generic ChatGPT-4 model and a customized version of ChatGPT-4 to provide recommendations for the surgical management of gastroesophageal reflux disease (GERD) to both surgeons and patients.
Sixty patient cases were developed using eligibility criteria from the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) & United European Gastroenterology (UEG)-European Association of Endoscopic. Surgery (EAES) guidelines for the surgical management of GERD. Standardized prompts were engineered for physicians as the end-user, with separate layperson prompts for patients. A customized GPT was developed to generate recommendations based on guidelines, called the GERD Tool for Surgery (GTS). Both the GTS and generic ChatGPT-4 were queried July 21st, 2024. Model performance was evaluated by comparing responses to SAGES & UEG-EAES guideline recommendations. Outcome data was presented using descriptive statistics including counts and percentages.
The GTS provided accurate recommendations for the surgical management of GERD for 60/60 (100.0%) surgeon inquiries and 40/40 (100.0%) patient inquiries based on guideline recommendations. The Generic ChatGPT-4 model generated accurate guidance for 40/60 (66.7%) surgeon inquiries and 19/40 (47.5%) patient inquiries. The GTS produced recommendations based on the 2021 SAGES & UEG-EAES guidelines on the surgical management of GERD, while the generic ChatGPT-4 model generated guidance without citing evidence to support its recommendations.
ChatGPT-4 can be customized to overcome limitations with its training dataset to provide recommendations for the surgical management of GERD with reliable accuracy and consistency. The training of LLM models can be used to help integrate this efficient technology into the creation of robust and accurate information for both surgeons and patients. Prospective data is needed to assess its effectiveness in a pragmatic clinical environment.
由于其训练数据集的限制,大型语言模型 (LLM) 在提供临床指导时准确性不一致。LLM 可以通过定制来“教授”。我们比较了通用 ChatGPT-4 模型和定制版 ChatGPT-4 为外科医生和患者提供胃食管反流病 (GERD) 手术管理建议的能力。
根据美国胃肠内镜外科医师学会 (SAGES) 和欧洲胃肠病学联合会 (UEG)-欧洲内镜外科学会 (EAES) 指南中 GERD 手术管理的标准,使用纳入标准开发了 60 例患者病例。为医生作为最终用户设计了标准化提示,为患者设计了单独的非专业提示。开发了一个名为 GERD 手术工具 (GTS) 的定制 GPT,根据指南生成建议。2024 年 7 月 21 日,对 GTS 和通用 ChatGPT-4 进行了查询。通过比较对 SAGES & UEG-EAES 指南建议的响应来评估模型性能。使用描述性统计数据(包括计数和百分比)呈现结果数据。
根据指南建议,GTS 为 60/60(100.0%)外科医生查询和 40/40(100.0%)患者查询提供了 GERD 手术管理的准确建议。通用 ChatGPT-4 模型为 40/60(66.7%)外科医生查询和 19/40(47.5%)患者查询生成了准确的指导。GTS 根据 2021 年 SAGES & UEG-EAES 关于 GERD 手术管理的指南提出建议,而通用 ChatGPT-4 模型则在没有引用证据支持其建议的情况下提供指导。
可以对 ChatGPT-4 进行定制,以克服其训练数据集的限制,为 GERD 的手术管理提供可靠且一致的建议。大型语言模型的培训可用于帮助将这种高效技术整合到为外科医生和患者创建强大且准确的信息中。需要前瞻性数据来评估其在实际临床环境中的有效性。