Varady Nathan H, Lu Amy Z, Mazzucco Michael, Dines Joshua S, Altchek David W, Williams Riley J, Kunze Kyle N
Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA.
Weill Cornell Medical College, New York, New York, USA.
Orthop J Sports Med. 2024 Jul 31;12(7):23259671241257516. doi: 10.1177/23259671241257516. eCollection 2024 Jul.
The consumer availability and automated response functions of chat generator pretrained transformer (ChatGPT-4), a large language model, poise this application to be utilized for patient health queries and may have a role in serving as an adjunct to minimize administrative and clinical burden.
To evaluate the ability of ChatGPT-4 to respond to patient inquiries concerning ulnar collateral ligament (UCL) injuries and compare these results with the performance of Google.
Cross-sectional study.
Google Web Search was used as a benchmark, as it is the most widely used search engine worldwide and the only search engine that generates frequently asked questions (FAQs) when prompted with a query, allowing comparisons through a systematic approach. The query "ulnar collateral ligament reconstruction" was entered into Google, and the top 10 FAQs, answers, and their sources were recorded. ChatGPT-4 was prompted to perform a Google search of FAQs with the same query and to record the sources of answers for comparison. This process was again replicated to obtain 10 new questions requiring numeric instead of open-ended responses. Finally, responses were graded independently for clinical accuracy (grade 0 = inaccurate, grade 1 = somewhat accurate, grade 2 = accurate) by 2 fellowship-trained sports medicine surgeons (D.W.A, J.S.D.) blinded to the search engine and answer source.
ChatGPT-4 used a greater proportion of academic sources than Google to provide answers to the top 10 FAQs, although this was not statistically significant (90% vs 50%; = .14). In terms of question overlap, 40% of the most common questions on Google and ChatGPT-4 were the same. When comparing FAQs with numeric responses, 20% of answers were completely overlapping, 30% demonstrated partial overlap, and the remaining 50% did not demonstrate any overlap. All sources used by ChatGPT-4 to answer these FAQs were academic, while only 20% of sources used by Google were academic ( = .0007). The remaining Google sources included social media (40%), medical practices (20%), single-surgeon websites (10%), and commercial websites (10%). The mean (± standard deviation) accuracy for answers given by ChatGPT-4 was significantly greater compared with Google for the top 10 FAQs (1.9 ± 0.2 vs 1.2 ± 0.6; = .001) and top 10 questions with numeric answers (1.8 ± 0.4 vs 1 ± 0.8; = .013).
ChatGPT-4 is capable of providing responses with clinically relevant content concerning UCL injuries and reconstruction. ChatGPT-4 utilized a greater proportion of academic websites to provide responses to FAQs representative of patient inquiries compared with Google Web Search and provided significantly more accurate answers. Moving forward, ChatGPT has the potential to be used as a clinical adjunct when answering queries about UCL injuries and reconstruction, but further validation is warranted before integrated or autonomous use in clinical settings.
大型语言模型聊天生成预训练变换器(ChatGPT-4)的用户可用性和自动回复功能,使得该应用可用于患者健康咨询,并且在作为辅助工具以减轻管理和临床负担方面可能发挥作用。
评估ChatGPT-4回答患者关于尺侧副韧带(UCL)损伤问题的能力,并将这些结果与谷歌的表现进行比较。
横断面研究。
谷歌网络搜索被用作基准,因为它是全球使用最广泛的搜索引擎,并且是唯一在收到查询时会生成常见问题解答(FAQ)的搜索引擎,这使得能够通过系统的方法进行比较。在谷歌中输入查询“尺侧副韧带重建”,记录前10个常见问题解答、答案及其来源。提示ChatGPT-4使用相同的查询进行谷歌搜索以获取常见问题解答,并记录答案来源以进行比较。此过程再次重复以获得10个需要数字而非开放式回答的新问题。最后,由2名经过专科培训的运动医学外科医生(D.W.A,J.S.D.)在对搜索引擎和答案来源不知情的情况下,独立对回答的临床准确性进行评分(0级=不准确,1级=有点准确,2级=准确)。
在回答前10个常见问题解答时,ChatGPT-4使用学术来源的比例高于谷歌,尽管这在统计学上不显著(90%对50%;P = 0.14)。在问题重叠方面,谷歌和ChatGPT-4上最常见问题的40%是相同的。在比较需要数字回答的常见问题解答时,20%的答案完全重叠,30%部分重叠,其余50%没有任何重叠。ChatGPT-4用于回答这些常见问题解答的所有来源均为学术性的,而谷歌使用的来源中只有20%是学术性的(P = 0.0007)。谷歌的其余来源包括社交媒体(40%)、医疗实践(20%)、单外科医生网站(10%)和商业网站(10%)。对于前10个常见问题解答(1.9±0.2对1.2±0.6;P = 0.001)和前10个需要数字答案的问题(1.8±0.4对1±0.8;P = 0.013),ChatGPT-4给出答案的平均(±标准差)准确性显著高于谷歌。
ChatGPT-4能够提供有关UCL损伤和重建的具有临床相关性的内容的回复。与谷歌网络搜索相比,ChatGPT-4在回答代表患者咨询的常见问题解答时使用了更高比例的学术网站,并提供了明显更准确的答案。展望未来,ChatGPT在回答有关UCL损伤和重建的问题时有可能用作临床辅助工具,但在临床环境中进行整合或自主使用之前,还需要进一步验证。