基于大语言模型的聊天机器人与外科医生生成的常见手术知情同意书文档。

Large Language Model-Based Chatbot vs Surgeon-Generated Informed Consent Documentation for Common Procedures.

机构信息

Department of Surgery, University of California, San Francisco.

Department of Medicine, University of California, San Francisco.

出版信息

JAMA Netw Open. 2023 Oct 2;6(10):e2336997. doi: 10.1001/jamanetworkopen.2023.36997.

DOI:10.1001/jamanetworkopen.2023.36997

PMID:37812419

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10562939/

Abstract

IMPORTANCE

Informed consent is a critical component of patient care before invasive procedures, yet it is frequently inadequate. Electronic consent forms have the potential to facilitate patient comprehension if they provide information that is readable, accurate, and complete; it is not known if large language model (LLM)-based chatbots may improve informed consent documentation by generating accurate and complete information that is easily understood by patients.

OBJECTIVE

To compare the readability, accuracy, and completeness of LLM-based chatbot- vs surgeon-generated information on the risks, benefits, and alternatives (RBAs) of common surgical procedures.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study compared randomly selected surgeon-generated RBAs used in signed electronic consent forms at an academic referral center in San Francisco with LLM-based chatbot-generated (ChatGPT-3.5, OpenAI) RBAs for 6 surgical procedures (colectomy, coronary artery bypass graft, laparoscopic cholecystectomy, inguinal hernia repair, knee arthroplasty, and spinal fusion).

MAIN OUTCOMES AND MEASURES

Readability was measured using previously validated scales (Flesh-Kincaid grade level, Gunning Fog index, the Simple Measure of Gobbledygook, and the Coleman-Liau index). Scores range from 0 to greater than 20 to indicate the years of education required to understand a text. Accuracy and completeness were assessed using a rubric developed with recommendations from LeapFrog, the Joint Commission, and the American College of Surgeons. Both composite and RBA subgroup scores were compared.

RESULTS

The total sample consisted of 36 RBAs, with 1 RBA generated by the LLM-based chatbot and 5 RBAs generated by a surgeon for each of the 6 surgical procedures. The mean (SD) readability score for the LLM-based chatbot RBAs was 12.9 (2.0) vs 15.7 (4.0) for surgeon-generated RBAs (P = .10). The mean (SD) composite completeness and accuracy score was lower for surgeons' RBAs at 1.6 (0.5) than for LLM-based chatbot RBAs at 2.2 (0.4) (P < .001). The LLM-based chatbot scores were higher than the surgeon-generated scores for descriptions of the benefits of surgery (2.3 [0.7] vs 1.4 [0.7]; P < .001) and alternatives to surgery (2.7 [0.5] vs 1.4 [0.7]; P < .001). There was no significant difference in chatbot vs surgeon RBA scores for risks of surgery (1.7 [0.5] vs 1.7 [0.4]; P = .38).

CONCLUSIONS AND RELEVANCE

The findings of this cross-sectional study suggest that despite not being perfect, LLM-based chatbots have the potential to enhance informed consent documentation. If an LLM were embedded in electronic health records in a manner compliant with the Health Insurance Portability and Accountability Act, it could be used to provide personalized risk information while easing documentation burden for physicians.

摘要

重要性

知情同意是侵入性手术前患者护理的重要组成部分，但它经常不充分。如果电子同意书能够提供易于理解的可读、准确和完整的信息，那么它们有可能促进患者的理解；目前尚不清楚大型语言模型 (LLM) 为基础的聊天机器人是否可以通过生成易于患者理解的准确和完整信息来改善知情同意文件。

目的

比较 LLM 为基础的聊天机器人生成的与外科医生生成的关于常见手术风险、益处和替代方案 (RBA) 的信息的可读性、准确性和完整性。

设计、设置和参与者：这项横断面研究比较了旧金山一家学术转诊中心电子同意书中随机选择的外科医生生成的 RBA 与 LLM 为基础的聊天机器人生成的（ChatGPT-3.5、OpenAI）6 种手术（结肠切除术、冠状动脉旁路移植术、腹腔镜胆囊切除术、腹股沟疝修补术、膝关节置换术和脊柱融合术）的 RBA。

主要结果和措施

使用以前验证过的量表（Flesh-Kincaid 年级水平、Gunning Fog 指数、简单 Googlegook 量表和 Coleman-Liau 指数）来衡量可读性。得分范围从 0 到 20 以上，表示理解文本所需的教育年限。使用 LeapFrog、联合委员会和美国外科医生学院的建议制定的评分标准来评估准确性和完整性。比较了综合和 RBA 亚组得分。

结果

总样本包括 36 份 RBA，其中 1 份由 LLM 为基础的聊天机器人生成，6 种手术中每种手术由 5 份由外科医生生成。LLM 为基础的聊天机器人生成的 RBA 的平均（SD）可读性评分为 12.9（2.0），而外科医生生成的 RBA 为 15.7（4.0）（P=0.10）。外科医生生成的 RBA 的平均（SD）综合完整性和准确性评分为 1.6（0.5），而 LLM 为基础的聊天机器人生成的 RBA 为 2.2（0.4）（P<0.001）。LLM 为基础的聊天机器人的评分高于外科医生生成的手术益处描述（2.3[0.7]与 1.4[0.7]；P<0.001）和手术替代方案（2.7[0.5]与 1.4[0.7]；P<0.001）。手术风险的聊天机器人与外科医生 RBA 评分无显著差异（1.7[0.5]与 1.7[0.4]；P=0.38）。

结论和相关性

这项横断面研究的结果表明，尽管不完美，但 LLM 为基础的聊天机器人有可能增强知情同意文件。如果 LLM 以符合《健康保险流通与责任法案》的方式嵌入电子健康记录中，它可以用于提供个性化的风险信息，同时减轻医生的文件负担。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5afe/10562939/3304e0b53776/jamanetwopen-e2336997-g001.jpg

相似文献

Large Language Model-Based Chatbot vs Surgeon-Generated Informed Consent Documentation for Common Procedures.基于大语言模型的聊天机器人与外科医生生成的常见手术知情同意书文档。

JAMA Netw Open. 2023 Oct 2;6(10):e2336997. doi: 10.1001/jamanetworkopen.2023.36997.

Comparison of Medical Research Abstracts Written by Surgical Trainees and Senior Surgeons or Generated by Large Language Models.外科住院医师和资深外科医生撰写的医学研究摘要与大型语言模型生成的摘要的比较。

JAMA Netw Open. 2024 Aug 1;7(8):e2425373. doi: 10.1001/jamanetworkopen.2024.25373.

Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management.评估大型语言模型对青光眼和视网膜管理相关问题和病例的回答。

JAMA Ophthalmol. 2024 Apr 1;142(4):371-375. doi: 10.1001/jamaophthalmol.2023.6917.

Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: A comparative study.评估心血管健康、肿瘤学和银屑病领域的聊天机器人的响应质量和可读性：一项比较研究。

Int J Med Inform. 2024 Oct;190:105562. doi: 10.1016/j.ijmedinf.2024.105562. Epub 2024 Jul 19.

Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model.提高健康素养：评估经 ChatGPT 大型语言模型修订的患者手册的可读性。

Otolaryngol Head Neck Surg. 2024 Dec;171(6):1751-1757. doi: 10.1002/ohn.927. Epub 2024 Aug 6.

Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format.生成式人工智能将住院病历摘要转换为患者友好型语言和格式。

JAMA Netw Open. 2024 Mar 4;7(3):e240357. doi: 10.1001/jamanetworkopen.2024.0357.

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.评估人工智能聊天机器人对心肺复苏术 100 个最常见查询的回答的易读性、可靠性和质量：一项观察性研究。

Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。

JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.

Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions.大型语言模型对放射肿瘤学患者护理问题的回复质量。

JAMA Netw Open. 2024 Apr 1;7(4):e244630. doi: 10.1001/jamanetworkopen.2024.4630.

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生：通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性，探索人工智能在眼科领域的应用。

Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.

引用本文的文献

Comparison of the readability of ChatGPT and Bard in medical communication: a meta-analysis.ChatGPT与Bard在医学交流中的可读性比较：一项荟萃分析。

BMC Med Inform Decis Mak. 2025 Sep 1;25(1):325. doi: 10.1186/s12911-025-03035-2.

Patient consent in the modern era: Novel tools and practical considerations in urology.现代社会中的患者同意：泌尿外科的新工具与实际考量

Curr Urol. 2025 Jul;19(4):235-240. doi: 10.1097/CU9.0000000000000282. Epub 2025 Apr 1.

A large language model digital patient system enhances ophthalmology history taking skills.一个大语言模型数字患者系统提升了眼科病史采集技能。

NPJ Digit Med. 2025 Aug 4;8(1):502. doi: 10.1038/s41746-025-01841-6.

Information Extraction and Summarization for Neurovascular Consultations with GPT-4o: A Clinical Case Study.使用GPT-4o进行神经血管会诊的信息提取与总结：一项临床案例研究。

Clin Neuroradiol. 2025 Jul 31. doi: 10.1007/s00062-025-01538-z.

The Role of Large Language Models (LLMs) in Hepato-Pancreato-Biliary Surgery: Opportunities and Challenges.大语言模型在肝胰胆外科手术中的作用：机遇与挑战

Cureus. 2025 Jun 14;17(6):e85979. doi: 10.7759/cureus.85979. eCollection 2025 Jun.

Evaluation and comparison of large language models' responses to questions related optic neuritis.大语言模型对与视神经炎相关问题的回答的评估与比较

Front Med (Lausanne). 2025 Jun 25;12:1516442. doi: 10.3389/fmed.2025.1516442. eCollection 2025.

Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用：一项范围综述

J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.

Large Language Model-Assisted Surgical Consent Forms in Non-English Language: Content Analysis and Readability Evaluation.非英语语言的大语言模型辅助手术同意书：内容分析与可读性评估

J Med Internet Res. 2025 Jun 19;27:e73222. doi: 10.2196/73222.

Areas of research focus and trends in the research on the application of AIGC in healthcare.人工智能生成内容（AIGC）在医疗保健领域应用的研究重点领域和研究趋势。

J Health Popul Nutr. 2025 Jun 14;44(1):195. doi: 10.1186/s41043-025-00947-7.

Primer on large language models: an educational overview for intensivists.大语言模型入门：重症医学专家的教育概述

Crit Care. 2025 Jun 12;29(1):238. doi: 10.1186/s13054-025-05479-4.

本文引用的文献

Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge.生成式人工智能模型在复杂诊断挑战中的准确性。

JAMA. 2023 Jul 3;330(1):78-80. doi: 10.1001/jama.2023.8288.

Health system-scale language models are all-purpose prediction engines.健康系统规模的语言模型是通用的预测引擎。

Nature. 2023 Jul;619(7969):357-362. doi: 10.1038/s41586-023-06160-y. Epub 2023 Jun 7.

FUTURE OF THE LANGUAGE MODELS IN HEALTHCARE: THE ROLE OF CHATGPT.语言模型在医疗保健领域的未来：ChatGPT 的作用。

Arq Bras Cir Dig. 2023 May 8;36:e1727. doi: 10.1590/0102-672020230002e1727. eCollection 2023.

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery.评估语言模型 ChatGPT 对肥胖症手术相关问题回答的准确性。

Obes Surg. 2023 Jun;33(6):1790-1796. doi: 10.1007/s11695-023-06603-5. Epub 2023 Apr 27.

Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information.利用 ChatGPT 评估癌症谣言和误解：人工智能与癌症信息。

JNCI Cancer Spectr. 2023 Mar 1;7(2). doi: 10.1093/jncics/pkad015.

ChatGPT: friend or foe?ChatGPT：朋友还是敌人？

Lancet Digit Health. 2023 Mar;5(3):e102. doi: 10.1016/S2589-7500(23)00023-7. Epub 2023 Feb 6.

Craniosynostosis: Are Online Resources Readable?颅缝早闭：网络资源可读性如何？

Cleft Palate Craniofac J. 2024 Jul;61(7):1228-1232. doi: 10.1177/10556656231154843. Epub 2023 Feb 6.

Electronic Consent at US Cancer Centers: A Survey of Practices, Challenges, and Opportunities.美国癌症中心的电子知情同意书：实践、挑战和机遇的调查。

JCO Clin Cancer Inform. 2023 Jan;7:e2200122. doi: 10.1200/CCI.22.00122.

Assessing the readability and quality of online information on Bell's palsy.评估贝尔氏麻痹症相关在线信息的可读性和质量。

J Laryngol Otol. 2023 Oct;137(10):1130-1134. doi: 10.1017/S0022215122002626. Epub 2022 Dec 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于大语言模型的聊天机器人与外科医生生成的常见手术知情同意书文档。

Large Language Model-Based Chatbot vs Surgeon-Generated Informed Consent Documentation for Common Procedures.

机构信息

出版信息

IMPORTANCE

OBJECTIVE

MAIN OUTCOMES AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

重要性

目的

主要结果和措施

结果

结论和相关性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献