ChatGPT针对老年人髋部骨折的建议与2021年美国骨科学会循证指南的差异。

Discrepancies in ChatGPT's Hip Fracture Recommendations in Older Adults for 2021 AAOS Evidence-Based Guidelines.

作者信息

Kim Hong Jin, Yoon Pil Whan, Yoon Jae Youn, Kim Hyungtae, Choi Young Jin, Park Sangyoon, Moon Jun-Ki

机构信息

Department of Orthopaedic Surgery, Kyung-in Regional Military Manpower Administration, Suwon 16440, Republic of Korea.

Department of Orthopedic Surgery, Inje University Sanggye Paik Hospital, Seoul 01757, Republic of Korea.

出版信息

J Clin Med. 2024 Oct 8;13(19):5971. doi: 10.3390/jcm13195971.

DOI:10.3390/jcm13195971

PMID:39408030

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11477870/

Abstract

: This study aimed to assess the reproducibility and reliability of Chat-Based GPT (ChatGPT)'s responses to 19 statements regarding the management of hip fractures in older adults as adopted by the American Academy of Orthopaedic Surgeons' (AAOS) evidence-based clinical practice guidelines. : Nineteen statements were obtained from the 2021 AAOS evidence-based clinical practice guidelines. After generating questions based on these 19 statements, we set a prompt for both the GPT-4o and GPT-4 models. We repeated this process three times at 24 h intervals for both models, producing outputs A, B, and C. ChatGPT's performance, the intra-ChatGPT reliability, and the accuracy rates were assessed to evaluate the reproducibility and reliability of the hip fracture-related guidelines. Regarding the strengths of the recommendation compared with the 2021 AAOS guidelines, we observed accuracy of 0.684, 0.579, and 0.632 for outputs A, B, and C, respectively. : The precision was 0.740, 0.737, and 0.718 in outputs A, B, and C, respectively. For the reliability of the strengths of the recommendation, the Fleiss kappa was 0.409, indicating a moderate level of agreement. No statistical differences in the strengths of the recommendation were observed in outputs A, B, and C between the GPT-4o and GPT-4 versions. : ChatGPT may be useful in providing guidelines for hip fractures but performs poorly in terms of accuracy and precision. However, hallucinations remain an unresolved limitation associated with using ChatGPT to search for hip fracture guidelines. The effective utilization of ChatGPT as a patient education tool for the management of hip fractures should be addressed in the future.

摘要

本研究旨在评估基于聊天的生成式预训练变换器（ChatGPT）对美国矫形外科医师学会（AAOS）循证临床实践指南中关于老年人髋部骨折管理的19条陈述的回复的可重复性和可靠性。

从2021年AAOS循证临床实践指南中获取了19条陈述。基于这19条陈述生成问题后，我们为GPT-4o和GPT-4模型设置了提示。我们对两个模型每隔24小时重复此过程三次，生成输出A、B和C。评估了ChatGPT的表现、ChatGPT内部的可靠性以及准确率，以评估髋部骨折相关指南的可重复性和可靠性。与2021年AAOS指南相比，在推荐强度方面，我们观察到输出A、B和C的准确率分别为0.684、0.579和0.632。

输出A、B和C的精确率分别为0.740、0.737和0.718。对于推荐强度的可靠性，Fleiss卡方值为0.409，表明一致性处于中等水平。在GPT-4o和GPT-4版本的输出A、B和C之间，未观察到推荐强度的统计学差异。

ChatGPT在提供髋部骨折指南方面可能有用，但在准确性和精确性方面表现不佳。然而，幻觉仍然是使用ChatGPT搜索髋部骨折指南相关的一个未解决的局限性。未来应探讨如何有效利用ChatGPT作为髋部骨折管理的患者教育工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/041f/11477870/9897006a9f58/jcm-13-05971-g001.jpg

相似文献

Discrepancies in ChatGPT's Hip Fracture Recommendations in Older Adults for 2021 AAOS Evidence-Based Guidelines.

J Clin Med. 2024 Oct 8;13(19):5971. doi: 10.3390/jcm13195971.

"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?

Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.

ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis.

Eur Spine J. 2024 Nov;33(11):4182-4203. doi: 10.1007/s00586-024-08198-6. Epub 2024 Mar 15.

Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain: A Comparison Study.

Spine (Phila Pa 1976). 2024 May 1;49(9):640-651. doi: 10.1097/BRS.0000000000004915. Epub 2024 Jan 12.

Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.

Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4.

Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis.

J Arthroplasty. 2024 May;39(5):1184-1190. doi: 10.1016/j.arth.2024.01.029. Epub 2024 Jan 17.

ChatGPT's Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study.

J Med Internet Res. 2024 Apr 22;26:e55037. doi: 10.2196/55037.

Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery.

Neurospine. 2024 Mar;21(1):128-146. doi: 10.14245/ns.2347310.655. Epub 2024 Mar 31.

Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.

JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.

ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.

Eur Radiol. 2025 Jan;35(1):506-516. doi: 10.1007/s00330-024-10902-5. Epub 2024 Jul 12.

引用本文的文献

Can Artificial Intelligence Help Orthopaedic Surgeons in the Conservative Management of Knee Osteoarthritis? A Consensus Analysis.

J Clin Med. 2025 Jan 22;14(3):690. doi: 10.3390/jcm14030690.

本文引用的文献

Can Large Language Models (LLMs) Predict the Appropriate Treatment of Acute Hip Fractures in Older Adults? Comparing Appropriate Use Criteria With Recommendations From ChatGPT.

J Am Acad Orthop Surg Glob Res Rev. 2024 Aug 9;8(8). doi: 10.5435/JAAOSGlobal-D-24-00206. eCollection 2024 Aug 1.

Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis.

J Med Internet Res. 2024 Jun 26;26:e52001. doi: 10.2196/52001.

ChatGPT Provides Satisfactory but Occasionally Inaccurate Answers to Common Patient Hip Arthroscopy Questions.

Arthroscopy. 2025 May;41(5):1337-1347. doi: 10.1016/j.arthro.2024.06.017. Epub 2024 Jun 22.

Detecting hallucinations in large language models using semantic entropy.

Nature. 2024 Jun;630(8017):625-630. doi: 10.1038/s41586-024-07421-0. Epub 2024 Jun 19.

Evaluating ChatGPT's Ability to Answer Common Patient Questions Regarding Hip Fracture.

J Am Acad Orthop Surg. 2024 Jul 15;32(14):656-659. doi: 10.5435/JAAOS-D-23-00877. Epub 2024 May 14.

Assessing the Accuracy and Reliability of AI-Generated Responses to Patient Questions Regarding Spine Surgery.

J Bone Joint Surg Am. 2024 Jun 19;106(12):1136-1142. doi: 10.2106/JBJS.23.00914. Epub 2024 Feb 9.

Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field.

J Orthop Surg Res. 2024 Jan 3;19(1):27. doi: 10.1186/s13018-023-04467-0.

Large language models in medicine.

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Assessing ChatGPT Responses to Common Patient Questions Regarding Total Hip Arthroplasty.

J Bone Joint Surg Am. 2023 Oct 4;105(19):1519-1526. doi: 10.2106/JBJS.23.00209. Epub 2023 Jul 17.

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.

J Med Internet Res. 2023 Jun 30;25:e47479. doi: 10.2196/47479.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT针对老年人髋部骨折的建议与2021年美国骨科学会循证指南的差异。

Discrepancies in ChatGPT's Hip Fracture Recommendations in Older Adults for 2021 AAOS Evidence-Based Guidelines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献