大型语言模型用于肩关节置换临床信息检索的有效性

Effectiveness of a large language model for clinical information retrieval regarding shoulder arthroplasty.

作者信息

Oeding Jacob F, Lu Amy Z, Mazzucco Michael, Fu Michael C, Dines David M, Warren Russell F, Gulotta Lawrence V, Dines Joshua S, Kunze Kyle N

机构信息

Department of Orthopaedics, Institute of Clinical Sciences, The Sahlgrenska Academy University of Gothenburg Gothenburg Sweden.

Weill Cornell Medical College New York New York USA.

出版信息

J Exp Orthop. 2024 Dec 17;11(4):e70114. doi: 10.1002/jeo2.70114. eCollection 2024 Oct.

DOI:10.1002/jeo2.70114

PMID:39691559

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11649951/

Abstract

PURPOSE

To determine the scope and accuracy of medical information provided by ChatGPT-4 in response to clinical queries concerning total shoulder arthroplasty (TSA), and to compare these results to those of the Google search engine.

METHODS

A patient-replicated query for 'total shoulder replacement' was performed using both Google Web Search (the most frequently used search engine worldwide) and ChatGPT-4. The top 10 frequently asked questions (FAQs), answers, and associated sources were extracted. This search was performed again independently to identify the top 10 FAQs necessitating numerical responses such that the concordance of answers could be compared between Google and ChatGPT-4. The clinical relevance and accuracy of the provided information were graded by two blinded orthopaedic shoulder surgeons.

RESULTS

Concerning FAQs with numeric responses, 8 out of 10 (80%) had identical answers or substantial overlap between ChatGPT-4 and Google. Accuracy of information was not significantly different ( = 0.32). Google sources included 40% medical practices, 30% academic, 20% single-surgeon practice, and 10% social media, while ChatGPT-4 used 100% academic sources, representing a statistically significant difference ( = 0.001). Only 3 out of 10 (30%) FAQs with open-ended answers were identical between ChatGPT-4 and Google. The clinical relevance of FAQs was not significantly different ( = 0.18). Google sources for open-ended questions included academic (60%), social media (20%), medical practice (10%) and single-surgeon practice (10%), while 100% of sources for ChatGPT-4 were academic, representing a statistically significant difference ( = 0.0025).

CONCLUSION

ChatGPT-4 provided trustworthy academic sources for medical information retrieval concerning TSA, while sources used by Google were heterogeneous. Accuracy and clinical relevance of information were not significantly different between ChatGPT-4 and Google.

LEVEL OF EVIDENCE

Level IV cross-sectional.

摘要

目的

确定ChatGPT-4针对全肩关节置换术（TSA）临床问题所提供医学信息的范围和准确性，并将这些结果与谷歌搜索引擎的结果进行比较。

方法

使用谷歌网络搜索（全球使用最频繁的搜索引擎）和ChatGPT-4对“全肩关节置换”进行患者复制查询。提取前10个常见问题（FAQ）、答案及相关来源。再次独立进行此搜索，以确定需要数值回复的前10个常见问题，以便比较谷歌和ChatGPT-4之间答案的一致性。由两位不知情的骨科肩关节外科医生对所提供信息的临床相关性和准确性进行分级。

结果

对于需要数值回复的常见问题，ChatGPT-4与谷歌之间有8个（80%）答案相同或有大量重叠。信息准确性无显著差异（P = 0.32）。谷歌的来源包括40%的医学实践、30%的学术、20%的单医生实践和10%的社交媒体，而ChatGPT-4使用100%的学术来源，差异有统计学意义（P = 0.001）。ChatGPT-4与谷歌之间，只有10个开放式答案的常见问题中有3个（30%）相同。常见问题的临床相关性无显著差异（P = 0.18）。谷歌针对开放式问题的来源包括学术（60%）、社交媒体（20%）、医学实践（10%）和单医生实践（10%），而ChatGPT-4的来源100%是学术，差异有统计学意义（P = 0.0025）。

结论

ChatGPT-4为TSA医学信息检索提供了可靠的学术来源，而谷歌使用的来源则多种多样。ChatGPT-4与谷歌之间信息的准确性和临床相关性无显著差异。

证据水平

IV级横断面研究。

相似文献

Effectiveness of a large language model for clinical information retrieval regarding shoulder arthroplasty.大型语言模型用于肩关节置换临床信息检索的有效性

J Exp Orthop. 2024 Dec 17;11(4):e70114. doi: 10.1002/jeo2.70114. eCollection 2024 Oct.

Understanding How ChatGPT May Become a Clinical Administrative Tool Through an Investigation on the Ability to Answer Common Patient Questions Concerning Ulnar Collateral Ligament Injuries.通过对ChatGPT回答有关尺侧副韧带损伤常见患者问题能力的调查，了解其如何成为临床管理工具。

Orthop J Sports Med. 2024 Jul 31;12(7):23259671241257516. doi: 10.1177/23259671241257516. eCollection 2024 Jul.

ChatGPT-4 Performs Clinical Information Retrieval Tasks Using Consistently More Trustworthy Resources Than Does Google Search for Queries Concerning the Latarjet Procedure.对于有关拉塔热手术的查询，ChatGPT-4在执行临床信息检索任务时，使用的资源始终比谷歌搜索更可靠。

Arthroscopy. 2025 Mar;41(3):588-597. doi: 10.1016/j.arthro.2024.05.025. Epub 2024 Jun 25.

Do ChatGPT and Google differ in answers to commonly asked patient questions regarding total shoulder and total elbow arthroplasty?ChatGPT 和谷歌在回答有关全肩和全肘人工关节置换术的常见患者问题方面是否存在差异？

J Shoulder Elbow Surg. 2024 Aug;33(8):e429-e437. doi: 10.1016/j.jse.2023.11.014. Epub 2024 Jan 3.

Using a Google Web Search Analysis to Assess the Utility of ChatGPT in Total Joint Arthroplasty.利用谷歌网页搜索分析评估 ChatGPT 在全关节置换中的效用。

J Arthroplasty. 2023 Jul;38(7):1195-1202. doi: 10.1016/j.arth.2023.04.007. Epub 2023 Apr 10.

Is ChatGPT a more academic source than google searches for patient questions about hip arthroscopy? An analysis of the most frequently asked questions.对于患者关于髋关节镜检查的问题，ChatGPT 比谷歌搜索是更具学术性的信息来源吗？对最常见问题的分析。

J ISAKOS. 2025 Jun;12:100892. doi: 10.1016/j.jisako.2025.100892. Epub 2025 May 3.

How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information.ChatGPT 与谷歌相比如何使用来源信息？在线健康信息的文本网络分析。

Clin Orthop Relat Res. 2024 Apr 1;482(4):578-588. doi: 10.1097/CORR.0000000000002995. Epub 2024 Mar 1.

ChatGPT and Google Provide Mostly Excellent or Satisfactory Responses to the Most Frequently Asked Patient Questions Related to Rotator Cuff Repair.ChatGPT和谷歌对与肩袖修复相关的最常见患者问题大多提供了极佳或令人满意的回答。

Arthrosc Sports Med Rehabil. 2024 Jun 25;6(5):100963. doi: 10.1016/j.asmr.2024.100963. eCollection 2024 Oct.

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生：通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性，探索人工智能在眼科领域的应用。

Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.

Using Google web search to analyze and evaluate the application of ChatGPT in femoroacetabular impingement syndrome.利用谷歌网页搜索分析和评估 ChatGPT 在股骨髋臼撞击综合征中的应用。

Front Public Health. 2024 May 31;12:1412063. doi: 10.3389/fpubh.2024.1412063. eCollection 2024.

本文引用的文献

Are ChatGPT's Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?ChatGPT关于髋膝关节假体周围感染的自由文本回复是否可靠且有用？

J Clin Med. 2023 Oct 20;12(20):6655. doi: 10.3390/jcm12206655.

Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。

JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

What if your patient switches from Dr. Google to Dr. ChatGPT? A vignette-based survey of the trustworthiness, value, and danger of ChatGPT-generated responses to health questions.如果你的患者从谷歌医生转向了 ChatGPT 医生，你会怎么办？基于病例的调查，评估 ChatGPT 生成的健康问题回答的可信度、价值和危险。

Eur J Cardiovasc Nurs. 2024 Jan 12;23(1):95-98. doi: 10.1093/eurjcn/zvad038.

ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology.ChatGPT 在妇产科虚拟客观结构化临床考试中优于人类考生。

Am J Obstet Gynecol. 2023 Aug;229(2):172.e1-172.e12. doi: 10.1016/j.ajog.2023.04.020. Epub 2023 Apr 22.

Using a Google Web Search Analysis to Assess the Utility of ChatGPT in Total Joint Arthroplasty.利用谷歌网页搜索分析评估 ChatGPT 在全关节置换中的效用。

J Arthroplasty. 2023 Jul;38(7):1195-1202. doi: 10.1016/j.arth.2023.04.007. Epub 2023 Apr 10.

Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma.评估 ChatGPT 在回答肝硬化和肝细胞癌相关问题方面的表现。

Clin Mol Hepatol. 2023 Jul;29(3):721-732. doi: 10.3350/cmh.2023.0089. Epub 2023 Mar 22.

Using ChatGPT to write patient clinic letters.使用ChatGPT撰写患者临床信函。

Lancet Digit Health. 2023 Apr;5(4):e179-e181. doi: 10.1016/S2589-7500(23)00048-1. Epub 2023 Mar 7.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试（USMLE）中的表现如何？大语言模型对医学教育和知识评估的影响。

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

ChatGPT listed as author on research papers: many scientists disapprove.研究论文将ChatGPT列为作者：许多科学家表示反对。

Nature. 2023 Jan;613(7945):620-621. doi: 10.1038/d41586-023-00107-z.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大型语言模型用于肩关节置换临床信息检索的有效性

Effectiveness of a large language model for clinical information retrieval regarding shoulder arthroplasty.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

LEVEL OF EVIDENCE

目的

方法

结果

结论

证据水平

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献