通过对ChatGPT回答有关尺侧副韧带损伤常见患者问题能力的调查，了解其如何成为临床管理工具。

Understanding How ChatGPT May Become a Clinical Administrative Tool Through an Investigation on the Ability to Answer Common Patient Questions Concerning Ulnar Collateral Ligament Injuries.

作者信息

Varady Nathan H, Lu Amy Z, Mazzucco Michael, Dines Joshua S, Altchek David W, Williams Riley J, Kunze Kyle N

机构信息

Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA.

Weill Cornell Medical College, New York, New York, USA.

出版信息

Orthop J Sports Med. 2024 Jul 31;12(7):23259671241257516. doi: 10.1177/23259671241257516. eCollection 2024 Jul.

DOI:10.1177/23259671241257516

PMID:39139744

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11320692/

Abstract

BACKGROUND

The consumer availability and automated response functions of chat generator pretrained transformer (ChatGPT-4), a large language model, poise this application to be utilized for patient health queries and may have a role in serving as an adjunct to minimize administrative and clinical burden.

PURPOSE

To evaluate the ability of ChatGPT-4 to respond to patient inquiries concerning ulnar collateral ligament (UCL) injuries and compare these results with the performance of Google.

STUDY DESIGN

Cross-sectional study.

METHODS

Google Web Search was used as a benchmark, as it is the most widely used search engine worldwide and the only search engine that generates frequently asked questions (FAQs) when prompted with a query, allowing comparisons through a systematic approach. The query "ulnar collateral ligament reconstruction" was entered into Google, and the top 10 FAQs, answers, and their sources were recorded. ChatGPT-4 was prompted to perform a Google search of FAQs with the same query and to record the sources of answers for comparison. This process was again replicated to obtain 10 new questions requiring numeric instead of open-ended responses. Finally, responses were graded independently for clinical accuracy (grade 0 = inaccurate, grade 1 = somewhat accurate, grade 2 = accurate) by 2 fellowship-trained sports medicine surgeons (D.W.A, J.S.D.) blinded to the search engine and answer source.

RESULTS

ChatGPT-4 used a greater proportion of academic sources than Google to provide answers to the top 10 FAQs, although this was not statistically significant (90% vs 50%; = .14). In terms of question overlap, 40% of the most common questions on Google and ChatGPT-4 were the same. When comparing FAQs with numeric responses, 20% of answers were completely overlapping, 30% demonstrated partial overlap, and the remaining 50% did not demonstrate any overlap. All sources used by ChatGPT-4 to answer these FAQs were academic, while only 20% of sources used by Google were academic ( = .0007). The remaining Google sources included social media (40%), medical practices (20%), single-surgeon websites (10%), and commercial websites (10%). The mean (± standard deviation) accuracy for answers given by ChatGPT-4 was significantly greater compared with Google for the top 10 FAQs (1.9 ± 0.2 vs 1.2 ± 0.6; = .001) and top 10 questions with numeric answers (1.8 ± 0.4 vs 1 ± 0.8; = .013).

CONCLUSION

ChatGPT-4 is capable of providing responses with clinically relevant content concerning UCL injuries and reconstruction. ChatGPT-4 utilized a greater proportion of academic websites to provide responses to FAQs representative of patient inquiries compared with Google Web Search and provided significantly more accurate answers. Moving forward, ChatGPT has the potential to be used as a clinical adjunct when answering queries about UCL injuries and reconstruction, but further validation is warranted before integrated or autonomous use in clinical settings.

摘要

背景

大型语言模型聊天生成预训练变换器（ChatGPT-4）的用户可用性和自动回复功能，使得该应用可用于患者健康咨询，并且在作为辅助工具以减轻管理和临床负担方面可能发挥作用。

目的

评估ChatGPT-4回答患者关于尺侧副韧带（UCL）损伤问题的能力，并将这些结果与谷歌的表现进行比较。

研究设计

横断面研究。

方法

谷歌网络搜索被用作基准，因为它是全球使用最广泛的搜索引擎，并且是唯一在收到查询时会生成常见问题解答（FAQ）的搜索引擎，这使得能够通过系统的方法进行比较。在谷歌中输入查询“尺侧副韧带重建”，记录前10个常见问题解答、答案及其来源。提示ChatGPT-4使用相同的查询进行谷歌搜索以获取常见问题解答，并记录答案来源以进行比较。此过程再次重复以获得10个需要数字而非开放式回答的新问题。最后，由2名经过专科培训的运动医学外科医生（D.W.A，J.S.D.）在对搜索引擎和答案来源不知情的情况下，独立对回答的临床准确性进行评分（0级=不准确，1级=有点准确，2级=准确）。

结果

在回答前10个常见问题解答时，ChatGPT-4使用学术来源的比例高于谷歌，尽管这在统计学上不显著（90%对50%；P = 0.14）。在问题重叠方面，谷歌和ChatGPT-4上最常见问题的40%是相同的。在比较需要数字回答的常见问题解答时，20%的答案完全重叠，30%部分重叠，其余50%没有任何重叠。ChatGPT-4用于回答这些常见问题解答的所有来源均为学术性的，而谷歌使用的来源中只有20%是学术性的（P = 0.0007）。谷歌的其余来源包括社交媒体（40%）、医疗实践（20%）、单外科医生网站（10%）和商业网站（10%）。对于前10个常见问题解答（1.9±0.2对1.2±0.6；P = 0.001）和前10个需要数字答案的问题（1.8±0.4对1±0.8；P = 0.013），ChatGPT-4给出答案的平均（±标准差）准确性显著高于谷歌。

结论

ChatGPT-4能够提供有关UCL损伤和重建的具有临床相关性的内容的回复。与谷歌网络搜索相比，ChatGPT-4在回答代表患者咨询的常见问题解答时使用了更高比例的学术网站，并提供了明显更准确的答案。展望未来，ChatGPT在回答有关UCL损伤和重建的问题时有可能用作临床辅助工具，但在临床环境中进行整合或自主使用之前，还需要进一步验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d40f/11320692/102a5e25a4a8/10.1177_23259671241257516-fig1.jpg

相似文献

Understanding How ChatGPT May Become a Clinical Administrative Tool Through an Investigation on the Ability to Answer Common Patient Questions Concerning Ulnar Collateral Ligament Injuries.

Orthop J Sports Med. 2024 Jul 31;12(7):23259671241257516. doi: 10.1177/23259671241257516. eCollection 2024 Jul.

ChatGPT-4 Performs Clinical Information Retrieval Tasks Using Consistently More Trustworthy Resources Than Does Google Search for Queries Concerning the Latarjet Procedure.

Arthroscopy. 2025 Mar;41(3):588-597. doi: 10.1016/j.arthro.2024.05.025. Epub 2024 Jun 25.

Using a Google Web Search Analysis to Assess the Utility of ChatGPT in Total Joint Arthroplasty.

J Arthroplasty. 2023 Jul;38(7):1195-1202. doi: 10.1016/j.arth.2023.04.007. Epub 2023 Apr 10.

Do ChatGPT and Google differ in answers to commonly asked patient questions regarding total shoulder and total elbow arthroplasty?

J Shoulder Elbow Surg. 2024 Aug;33(8):e429-e437. doi: 10.1016/j.jse.2023.11.014. Epub 2024 Jan 3.

How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information.

Clin Orthop Relat Res. 2024 Apr 1;482(4):578-588. doi: 10.1097/CORR.0000000000002995. Epub 2024 Mar 1.

ChatGPT-4 Generates More Accurate and Complete Responses to Common Patient Questions About Anterior Cruciate Ligament Reconstruction Than Google's Search Engine.

Arthrosc Sports Med Rehabil. 2024 Apr 9;6(3):100939. doi: 10.1016/j.asmr.2024.100939. eCollection 2024 Jun.

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.

Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.

Evaluation of Patient Education Materials From Large-Language Artificial Intelligence Models on Carpal Tunnel Release.

Hand (N Y). 2024 Apr 25:15589447241247332. doi: 10.1177/15589447241247332.

Using Google web search to analyze and evaluate the application of ChatGPT in femoroacetabular impingement syndrome.

Front Public Health. 2024 May 31;12:1412063. doi: 10.3389/fpubh.2024.1412063. eCollection 2024.

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.

Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

引用本文的文献

Exploring ChatGPT's Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons.

Arthroplast Today. 2025 Jul 14;34:101772. doi: 10.1016/j.artd.2025.101772. eCollection 2025 Aug.

本文引用的文献

Leveraging large language models for generating responses to patient messages-a subjective analysis.

J Am Med Inform Assoc. 2024 May 20;31(6):1367-1379. doi: 10.1093/jamia/ocae052.

Exploring the potential of ChatGPT as a supplementary tool for providing orthopaedic information.

Knee Surg Sports Traumatol Arthrosc. 2023 Nov;31(11):5190-5198. doi: 10.1007/s00167-023-07529-2. Epub 2023 Aug 8.

Creation and Adoption of Large Language Models in Medicine.

JAMA. 2023 Sep 5;330(9):866-869. doi: 10.1001/jama.2023.14217.

Enhancing Triage Efficiency and Accuracy in Emergency Rooms for Patients with Metastatic Prostate Cancer: A Retrospective Analysis of Artificial Intelligence-Assisted Triage Using ChatGPT 4.0.

Cancers (Basel). 2023 Jul 22;15(14):3717. doi: 10.3390/cancers15143717.

Large language models in medicine.

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Assessing ChatGPT Responses to Common Patient Questions Regarding Total Hip Arthroplasty.

J Bone Joint Surg Am. 2023 Oct 4;105(19):1519-1526. doi: 10.2106/JBJS.23.00209. Epub 2023 Jul 17.

Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination?

Postgrad Med J. 2023 Sep 21;99(1176):1110-1114. doi: 10.1093/postmj/qgad053.

Algorithmic bias and research integrity; the role of nonhuman authors in shaping scientific knowledge with respect to artificial intelligence: a perspective.

Int J Surg. 2023 Oct 1;109(10):2987-2990. doi: 10.1097/JS9.0000000000000552.

What's all the chatter about?

Bone Joint J. 2023 Jun 1;105-B(6):587-589. doi: 10.1302/0301-620X.105B6.BJJ-2023-0156.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过对ChatGPT回答有关尺侧副韧带损伤常见患者问题能力的调查，了解其如何成为临床管理工具。

Understanding How ChatGPT May Become a Clinical Administrative Tool Through an Investigation on the Ability to Answer Common Patient Questions Concerning Ulnar Collateral Ligament Injuries.

作者信息

Varady Nathan H, Lu Amy Z, Mazzucco Michael, Dines Joshua S, Altchek David W, Williams Riley J, Kunze Kyle N

机构信息

Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA.

Weill Cornell Medical College, New York, New York, USA.