Artamonov Alexander, Bachar-Avnieli Ira, Klang Eyal, Lubovsky Omri, Atoun Ehud, Bermant Alexander, Rosinsky Philip J
Orthopedic Department, Barzilai Medical Center, Ashkelon, Israel.
Ben-Gurion University, Beer-Sheva, Israel.
Arthrosc Sports Med Rehabil. 2024 Mar 5;6(3):100923. doi: 10.1016/j.asmr.2024.100923. eCollection 2024 Jun.
To compare the similarity of answers provided by Generative Pretrained Transformer-4 (GPT-4) with those of a consensus statement on diagnosis, nonoperative management, and Bankart repair in anterior shoulder instability (ASI).
An expert consensus statement on ASI published by Hurley et al. in 2022 was reviewed and questions laid out to the expert panel were extracted. GPT-4, the subscription version of ChatGPT, was queried using the same set of questions. Answers provided by GPT-4 were compared with those of the expert panel and subjectively rated for similarity by 2 experienced shoulder surgeons. GPT-4 was then used to rate the similarity of its own responses to the consensus statement, classifying them as low, medium, or high. Rates of similarity as classified by the shoulder surgeons and GPT-4 were then compared and interobserver reliability calculated using weighted κ scores.
The degree of similarity between responses of GPT-4 and the ASI consensus statement, as defined by shoulder surgeons, was high in 25.8%, medium in 45.2%, and low 29% of questions. GPT-4 assessed similarity as high in 48.3%, medium in 41.9%, and low 9.7% of questions. Surgeons and GPT-4 reached consensus on the classification of 18 questions (58.1%) and disagreement on 13 questions (41.9%).
The responses generated by artificial intelligence exhibit limited correlation with an expert statement on the diagnosis and treatment of ASI.
As the use of artificial intelligence becomes more prevalent, it is important to understand how closely information resembles content produced by human authors.
比较生成式预训练变换器4(GPT-4)给出的答案与关于前肩不稳(ASI)的诊断、非手术治疗和Bankart修复的共识声明的答案的相似性。
回顾了Hurley等人于2022年发表的关于ASI的专家共识声明,并提取了向专家小组提出的问题。使用相同的问题集查询ChatGPT的订阅版本GPT-4。将GPT-4给出的答案与专家小组的答案进行比较,并由2名经验丰富的肩部外科医生对相似性进行主观评分。然后使用GPT-4对其自身回答与共识声明的相似性进行评分,将其分为低、中、高。然后比较肩部外科医生和GPT-4分类的相似率,并使用加权κ评分计算观察者间的可靠性。
根据肩部外科医生的定义,GPT-4的回答与ASI共识声明之间的相似程度在25.8%的问题中为高,在45.2%的问题中为中,在29%的问题中为低。GPT-4评估相似性在48.3%的问题中为高,在41.9%的问题中为中,在9.7%的问题中为低。外科医生和GPT-4在18个问题(58.1%)的分类上达成共识,在13个问题(41.9%)上存在分歧。
人工智能生成的回答与关于ASI诊断和治疗的专家声明的相关性有限。
随着人工智能的使用越来越普遍,了解信息与人类作者产生的内容的相似程度很重要。