评估ChatGPT对全膝关节置换常见问题的回答的准确性和相关性。

Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement.

作者信息

Zhang Siyuan, Liau Zi Qiang Glen, Tan Kian Loong Melvin, Chua Wei Liang

机构信息

Department of Orthopaedic Surgery, National University Health System, Level 11, NUHS Tower Block, 1E Kent Ridge Road, Singapore, 119228, Singapore.

出版信息

Knee Surg Relat Res. 2024 Apr 2;36(1):15. doi: 10.1186/s43019-024-00218-5.

DOI:10.1186/s43019-024-00218-5

PMID:38566254

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10986046/

Abstract

BACKGROUND

Chat Generative Pretrained Transformer (ChatGPT), a generative artificial intelligence chatbot, may have broad applications in healthcare delivery and patient education due to its ability to provide human-like responses to a wide range of patient queries. However, there is limited evidence regarding its ability to provide reliable and useful information on orthopaedic procedures. This study seeks to evaluate the accuracy and relevance of responses provided by ChatGPT to frequently asked questions (FAQs) regarding total knee replacement (TKR).

METHODS

A list of 50 clinically-relevant FAQs regarding TKR was collated. Each question was individually entered as a prompt to ChatGPT (version 3.5), and the first response generated was recorded. Responses were then reviewed by two independent orthopaedic surgeons and graded on a Likert scale for their factual accuracy and relevance. These responses were then classified into accurate versus inaccurate and relevant versus irrelevant responses using preset thresholds on the Likert scale.

RESULTS

Most responses were accurate, while all responses were relevant. Of the 50 FAQs, 44/50 (88%) of ChatGPT responses were classified as accurate, achieving a mean Likert grade of 4.6/5 for factual accuracy. On the other hand, 50/50 (100%) of responses were classified as relevant, achieving a mean Likert grade of 4.9/5 for relevance.

CONCLUSION

ChatGPT performed well in providing accurate and relevant responses to FAQs regarding TKR, demonstrating great potential as a tool for patient education. However, it is not infallible and can occasionally provide inaccurate medical information. Patients and clinicians intending to utilize this technology should be mindful of its limitations and ensure adequate supervision and verification of information provided.

摘要

背景

聊天生成预训练变换器（ChatGPT）是一种生成式人工智能聊天机器人，由于其能够对广泛的患者问题提供类似人类的回答，可能在医疗服务和患者教育中具有广泛应用。然而，关于其提供有关骨科手术可靠且有用信息的能力的证据有限。本研究旨在评估ChatGPT对有关全膝关节置换术（TKR）的常见问题（FAQ）所提供回答的准确性和相关性。

方法

整理了一份包含50个与TKR临床相关的常见问题列表。每个问题都作为提示单独输入到ChatGPT（版本3.5）中，并记录生成的第一个回答。然后由两位独立的骨科医生对回答进行审查，并根据李克特量表对其事实准确性和相关性进行评分。然后使用李克特量表上的预设阈值将这些回答分为准确与不准确以及相关与不相关的回答。

结果

大多数回答是准确的，而所有回答都是相关的。在50个常见问题中，ChatGPT的44/50（88%）个回答被归类为准确，事实准确性的平均李克特评分为4.6/5。另一方面，50/50（100%）的回答被归类为相关，相关性的平均李克特评分为4.9/5。

结论

ChatGPT在为有关TKR的常见问题提供准确且相关的回答方面表现良好，显示出作为患者教育工具的巨大潜力。然而，它并非绝对可靠，偶尔可能提供不准确的医疗信息。打算使用这项技术的患者和临床医生应注意其局限性，并确保对所提供信息进行充分监督和核实。

相似文献

Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement.评估ChatGPT对全膝关节置换常见问题的回答的准确性和相关性。

Knee Surg Relat Res. 2024 Apr 2;36(1):15. doi: 10.1186/s43019-024-00218-5.

An Artificial Intelligence Chatbot is an Accurate and Useful Online Patient Resource Prior to Total Knee Arthroplasty.人工智能聊天机器人是全膝关节置换术前准确且有用的在线患者资源。

J Arthroplasty. 2024 Aug;39(8S1):S358-S362. doi: 10.1016/j.arth.2024.02.005. Epub 2024 Feb 11.

Understanding How ChatGPT May Become a Clinical Administrative Tool Through an Investigation on the Ability to Answer Common Patient Questions Concerning Ulnar Collateral Ligament Injuries.通过对ChatGPT回答有关尺侧副韧带损伤常见患者问题能力的调查，了解其如何成为临床管理工具。

Orthop J Sports Med. 2024 Jul 31;12(7):23259671241257516. doi: 10.1177/23259671241257516. eCollection 2024 Jul.

Using a Google Web Search Analysis to Assess the Utility of ChatGPT in Total Joint Arthroplasty.利用谷歌网页搜索分析评估 ChatGPT 在全关节置换中的效用。

J Arthroplasty. 2023 Jul;38(7):1195-1202. doi: 10.1016/j.arth.2023.04.007. Epub 2023 Apr 10.

Use and Application of Large Language Models for Patient Questions Following Total Knee Arthroplasty.全膝关节置换术后患者问题的大语言模型应用与实践

J Arthroplasty. 2024 Sep;39(9):2289-2294. doi: 10.1016/j.arth.2024.03.017. Epub 2024 Mar 13.

Do ChatGPT and Google differ in answers to commonly asked patient questions regarding total shoulder and total elbow arthroplasty?ChatGPT 和谷歌在回答有关全肩和全肘人工关节置换术的常见患者问题方面是否存在差异？

J Shoulder Elbow Surg. 2024 Aug;33(8):e429-e437. doi: 10.1016/j.jse.2023.11.014. Epub 2024 Jan 3.

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生：通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性，探索人工智能在眼科领域的应用。

Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.

Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?ChatGPT在回答有关头颈癌的问题时准确可靠吗？

Front Oncol. 2023 Dec 1;13:1256459. doi: 10.3389/fonc.2023.1256459. eCollection 2023.

Evaluating ChatGPT responses to frequently asked patient questions regarding periprosthetic joint infection after total hip and knee arthroplasty.评估ChatGPT对全髋关节和膝关节置换术后假体周围关节感染常见患者问题的回答。

Digit Health. 2024 Aug 9;10:20552076241272620. doi: 10.1177/20552076241272620. eCollection 2024 Jan-Dec.

Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis.聊天生成预训练转换器（ChatGPT）和巴德：人工智能尚未为髋和膝关节骨关节炎提供临床支持的答案。

J Arthroplasty. 2024 May;39(5):1184-1190. doi: 10.1016/j.arth.2024.01.029. Epub 2024 Jan 17.

引用本文的文献

Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.针对乳糜泻基本常见问题，对ChatGPT准确性和可靠性的专家评估。

Sci Rep. 2025 Aug 14;15(1):29871. doi: 10.1038/s41598-025-15898-6.

Large Language Models in Spine Surgery: A Promising Technology.脊柱外科中的大语言模型：一项有前景的技术。

HSS J. 2025 May 29:15563316251340696. doi: 10.1177/15563316251340696.

Enhancing responses from large language models with role-playing prompts: a comparative study on answering frequently asked questions about total knee arthroplasty.通过角色扮演提示增强大语言模型的回答：关于全膝关节置换术常见问题解答的比较研究

BMC Med Inform Decis Mak. 2025 May 23;25(1):196. doi: 10.1186/s12911-025-03024-5.

Are Large Language Model-Based Chatbots Effective in Providing Reliable Medical Advice for Achilles Tendinopathy? An International Multispecialist Evaluation.基于大语言模型的聊天机器人在为跟腱病提供可靠医学建议方面是否有效？一项国际多专家评估。

Orthop J Sports Med. 2025 Apr 30;13(4):23259671251332596. doi: 10.1177/23259671251332596. eCollection 2025 Apr.

Large Language Models' Responses to Spinal Cord Injury: A Comparative Study of Performance.大语言模型对脊髓损伤的反应：性能比较研究

J Med Syst. 2025 Mar 25;49(1):39. doi: 10.1007/s10916-025-02170-7.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

ChatGPT-3.5 and -4.0 Do Not Reliably Create Readable Patient Education Materials for Common Orthopaedic Upper- and Lower-Extremity Conditions.ChatGPT-3.5和-4.0不能可靠地为常见的骨科上肢和下肢疾病创建可读性强的患者教育材料。

Arthrosc Sports Med Rehabil. 2024 Oct 10;7(1):101027. doi: 10.1016/j.asmr.2024.101027. eCollection 2025 Feb.

Large language models in patient education: a scoping review of applications in medicine.用于患者教育的大语言模型：医学应用的范围综述

Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.

本文引用的文献

Exploring the potential of ChatGPT as a supplementary tool for providing orthopaedic information.探索 ChatGPT 作为提供骨科信息的补充工具的潜力。

Knee Surg Sports Traumatol Arthrosc. 2023 Nov;31(11):5190-5198. doi: 10.1007/s00167-023-07529-2. Epub 2023 Aug 8.

Using ChatGPT for Writing Articles for Patients' Education for Dermatological Diseases: A Pilot Study.使用ChatGPT撰写皮肤病患者教育文章：一项试点研究。

Indian Dermatol Online J. 2023 Jun 28;14(4):482-486. doi: 10.4103/idoj.idoj_72_23. eCollection 2023 Jul-Aug.

Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions.人工智能与公共卫生：评估ChatGPT对疫苗接种谣言和误解的回应

Vaccines (Basel). 2023 Jul 7;11(7):1217. doi: 10.3390/vaccines11071217.

Thromboembolic prophylaxis in spine surgery: an analysis of ChatGPT recommendations.脊柱手术中的血栓栓塞预防：对ChatGPT推荐意见的分析

Spine J. 2023 Nov;23(11):1684-1691. doi: 10.1016/j.spinee.2023.07.015. Epub 2023 Jul 25.

Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions.眼科中的人工智能：GPT-3.5、GPT-4与人类专家回答StatPearls问题的比较分析

Cureus. 2023 Jun 22;15(6):e40822. doi: 10.7759/cureus.40822. eCollection 2023 Jun.

Caution! AI Bot Has Entered the Patient Chat: ChatGPT Has Limitations in Providing Accurate Urologic Healthcare Advice.注意！人工智能机器人已进入患者聊天界面：ChatGPT在提供准确的泌尿科医疗建议方面存在局限性。

Urology. 2023 Oct;180:278-284. doi: 10.1016/j.urology.2023.07.010. Epub 2023 Jul 17.

Two minutes of orthopaedics with ChatGPT: it is just the beginning; it's going to be hot, hot, hot!与ChatGPT一起探索两分钟骨科领域：这仅仅是个开始；未来将会非常热门！

Int Orthop. 2023 Aug;47(8):1887-1893. doi: 10.1007/s00264-023-05887-7.

Can ChatGPT, an Artificial Intelligence Language Model, Provide Accurate and High-quality Patient Information on Prostate Cancer?人工智能语言模型ChatGPT能否提供关于前列腺癌的准确且高质量的患者信息？

Urology. 2023 Oct;180:35-58. doi: 10.1016/j.urology.2023.05.040. Epub 2023 Jul 4.

Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet?评估大语言模型在回答常见患者胃肠道健康相关问题中的效用：我们做到了吗？

Diagnostics (Basel). 2023 Jun 2;13(11):1950. doi: 10.3390/diagnostics13111950.

Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential.使用ChatGPT和GPT-4通过提示学习将放射学报告翻译成通俗易懂的语言：结果、局限性和潜力。

Vis Comput Ind Biomed Art. 2023 May 18;6(1):9. doi: 10.1186/s42492-023-00136-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验