Suppr超能文献

大语言模型(LLMs)能否预测老年人急性髋部骨折的适当治疗方法?比较适当使用标准与 ChatGPT 的建议

Can Large Language Models (LLMs) Predict the Appropriate Treatment of Acute Hip Fractures in Older Adults? Comparing Appropriate Use Criteria With Recommendations From ChatGPT.

机构信息

From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho).

出版信息

J Am Acad Orthop Surg Glob Res Rev. 2024 Aug 9;8(8). doi: 10.5435/JAAOSGlobal-D-24-00206. eCollection 2024 Aug 1.

Abstract

BACKGROUND

Acute hip fractures are a public health problem affecting primarily older adults. Chat Generative Pretrained Transformer may be useful in providing appropriate clinical recommendations for beneficial treatment.

OBJECTIVE

To evaluate the accuracy of Chat Generative Pretrained Transformer (ChatGPT)-4.0 by comparing its appropriateness scores for acute hip fractures with the American Academy of Orthopaedic Surgeons (AAOS) Appropriate Use Criteria given 30 patient scenarios. "Appropriateness" indicates the unexpected health benefits of treatment exceed the expected negative consequences by a wide margin.

METHODS

Using the AAOS Appropriate Use Criteria as the benchmark, numerical scores from 1 to 9 assessed appropriateness. For each patient scenario, ChatGPT-4.0 was asked to assign an appropriate score for six treatments to manage acute hip fractures.

RESULTS

Thirty patient scenarios were evaluated for 180 paired scores. Comparing ChatGPT-4.0 with AAOS scores, there was a positive correlation for multiple cannulated screw fixation, total hip arthroplasty, hemiarthroplasty, and long cephalomedullary nails. Statistically significant differences were observed only between scores for long cephalomedullary nails.

CONCLUSION

ChatGPT-4.0 scores were not concordant with AAOS scores, overestimating the appropriateness of total hip arthroplasty, hemiarthroplasty, and long cephalomedullary nails, and underestimating the other three. ChatGPT-4.0 was inadequate in selecting an appropriate treatment deemed acceptable, most reasonable, and most likely to improve patient outcomes.

摘要

背景

急性髋部骨折是一个影响主要是老年人的公共卫生问题。聊天生成式预训练转换器 (ChatGPT) 可能在为有益的治疗提供适当的临床建议方面很有用。

目的

通过将 30 个患者病例的 ChatGPT-4.0 适宜性评分与美国骨科医师学会 (AAOS) 的适宜性使用标准进行比较,评估 ChatGPT-4.0 的准确性。“适宜性”表示治疗的意外健康益处远远超过预期的负面后果。

方法

以 AAOS 适宜性使用标准为基准,采用 1 到 9 的数值评分来评估适宜性。对于每个患者病例,要求 ChatGPT-4.0 为 6 种治疗急性髋部骨折的方法分配适宜性评分。

结果

评估了 30 个患者病例的 180 对评分。将 ChatGPT-4.0 与 AAOS 评分进行比较,对于多根空心螺钉固定、全髋关节置换术、半髋关节置换术和长股骨近端髓内钉,存在正相关。仅在长股骨近端髓内钉的评分之间观察到统计学显著差异。

结论

ChatGPT-4.0 的评分与 AAOS 评分不一致,过高估计了全髋关节置换术、半髋关节置换术和长股骨近端髓内钉的适宜性,过低估计了其他三种方法的适宜性。ChatGPT-4.0 在选择被认为可接受、最合理和最有可能改善患者结局的适当治疗方面不够充分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/997e/11319315/2b0fa98d8417/jagrr-8-e24.00206-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验