Oeding Jacob F, Lu Amy Z, Mazzucco Michael, Fu Michael C, Dines David M, Warren Russell F, Gulotta Lawrence V, Dines Joshua S, Kunze Kyle N
Department of Orthopaedics, Institute of Clinical Sciences, The Sahlgrenska Academy University of Gothenburg Gothenburg Sweden.
Weill Cornell Medical College New York New York USA.
J Exp Orthop. 2024 Dec 17;11(4):e70114. doi: 10.1002/jeo2.70114. eCollection 2024 Oct.
To determine the scope and accuracy of medical information provided by ChatGPT-4 in response to clinical queries concerning total shoulder arthroplasty (TSA), and to compare these results to those of the Google search engine.
A patient-replicated query for 'total shoulder replacement' was performed using both Google Web Search (the most frequently used search engine worldwide) and ChatGPT-4. The top 10 frequently asked questions (FAQs), answers, and associated sources were extracted. This search was performed again independently to identify the top 10 FAQs necessitating numerical responses such that the concordance of answers could be compared between Google and ChatGPT-4. The clinical relevance and accuracy of the provided information were graded by two blinded orthopaedic shoulder surgeons.
Concerning FAQs with numeric responses, 8 out of 10 (80%) had identical answers or substantial overlap between ChatGPT-4 and Google. Accuracy of information was not significantly different ( = 0.32). Google sources included 40% medical practices, 30% academic, 20% single-surgeon practice, and 10% social media, while ChatGPT-4 used 100% academic sources, representing a statistically significant difference ( = 0.001). Only 3 out of 10 (30%) FAQs with open-ended answers were identical between ChatGPT-4 and Google. The clinical relevance of FAQs was not significantly different ( = 0.18). Google sources for open-ended questions included academic (60%), social media (20%), medical practice (10%) and single-surgeon practice (10%), while 100% of sources for ChatGPT-4 were academic, representing a statistically significant difference ( = 0.0025).
ChatGPT-4 provided trustworthy academic sources for medical information retrieval concerning TSA, while sources used by Google were heterogeneous. Accuracy and clinical relevance of information were not significantly different between ChatGPT-4 and Google.
Level IV cross-sectional.
确定ChatGPT-4针对全肩关节置换术(TSA)临床问题所提供医学信息的范围和准确性,并将这些结果与谷歌搜索引擎的结果进行比较。
使用谷歌网络搜索(全球使用最频繁的搜索引擎)和ChatGPT-4对“全肩关节置换”进行患者复制查询。提取前10个常见问题(FAQ)、答案及相关来源。再次独立进行此搜索,以确定需要数值回复的前10个常见问题,以便比较谷歌和ChatGPT-4之间答案的一致性。由两位不知情的骨科肩关节外科医生对所提供信息的临床相关性和准确性进行分级。
对于需要数值回复的常见问题,ChatGPT-4与谷歌之间有8个(80%)答案相同或有大量重叠。信息准确性无显著差异(P = 0.32)。谷歌的来源包括40%的医学实践、30%的学术、20%的单医生实践和10%的社交媒体,而ChatGPT-4使用100%的学术来源,差异有统计学意义(P = 0.001)。ChatGPT-4与谷歌之间,只有10个开放式答案的常见问题中有3个(30%)相同。常见问题的临床相关性无显著差异(P = 0.18)。谷歌针对开放式问题的来源包括学术(60%)、社交媒体(20%)、医学实践(10%)和单医生实践(10%),而ChatGPT-4的来源100%是学术,差异有统计学意义(P = 0.0025)。
ChatGPT-4为TSA医学信息检索提供了可靠的学术来源,而谷歌使用的来源则多种多样。ChatGPT-4与谷歌之间信息的准确性和临床相关性无显著差异。
IV级横断面研究。