Department of Orthopaedic Surgery, Duke University Hospital, Durham, NC.
Department of Orthopaedic Surgery, Duke University Hospital, Durham, NC.
J Hand Surg Am. 2023 Nov;48(11):1122-1127. doi: 10.1016/j.jhsa.2023.08.003. Epub 2023 Sep 9.
The purpose of this study was to analyze the quality and readability of the information generated by an online artificial intelligence (AI) platform regarding 4 common hand surgeries and to compare AI-generated responses to those provided in the informational articles published by the American Society for Surgery of the Hand (ASSH) HandCare website.
An open AI model (ChatGPT) was used to answer questions commonly asked by patients on 4 common hand surgeries (carpal tunnel release, cubital tunnel release, trigger finger release, and distal radius fracture fixation). These answers were evaluated for medical accuracy, quality and readability and compared to answers derived from the ASSH HandCare materials.
For the AI model, the Journal of the American Medical Association benchmark criteria score was 0/4, and the DISCERN score was 58 (considered good). The areas in which the AI model lost points were primarily related to the lack of attribution, reliability and currency of the source material. For AI responses, the mean Flesch Kinkaid Reading Ease score was 15, and the Flesch Kinkaid Grade Level was 34, which is considered to be college level. For comparison, ASSH HandCare materials scored 3/4 on the Journal of the American Medical Association Benchmark, 71 on DISCERN (excellent), 9 on Flesch Kinkaid Grade Level, and 60 on Flesch Kinkaid Reading Ease score (eighth/ninth grade level).
An AI language model (ChatGPT) provided generally high-quality answers to frequently asked questions relating to the common hand procedures queried, but it is unclear when or where these answers came from without citations to source material. Furthermore, a high reading level was required to comprehend the information presented. The AI software repeatedly referenced the need to discuss these questions with a surgeon, the importance of shared decision-making and individualized care, and compliance with surgeon treatment recommendations.
As novel AI applications become increasingly mainstream, hand surgeons must understand the limitations and ramifications these technologies have for patient care.
本研究旨在分析在线人工智能(AI)平台生成的有关 4 种常见手部手术的信息的质量和可读性,并比较 AI 生成的回复与美国手外科学会(ASSH)HandCare 网站发表的信息文章中提供的回复。
使用开放式 AI 模型(ChatGPT)回答患者经常询问的 4 种常见手部手术(腕管松解术、肘管松解术、扳机指松解术和桡骨远端骨折固定术)的问题。评估这些答案的医学准确性、质量和可读性,并与 ASSH HandCare 材料的答案进行比较。
对于 AI 模型,《美国医学会杂志》基准标准评分为 0/4,DISCERN 评分为 58(良好)。AI 模型失分的主要原因是缺乏对来源材料的归属、可靠性和时效性的说明。对于 AI 回复,Flesch-Kincaid 阅读舒适度得分的平均值为 15,Flesch-Kincaid 年级水平为 34,这被认为是大学水平。相比之下,ASSH HandCare 材料在《美国医学会杂志》基准测试中得分为 3/4,DISCERN 得分为 71(优秀),Flesch-Kincaid 年级水平得分为 9,Flesch-Kincaid 阅读舒适度得分得分为 60(八年级/九年级水平)。
AI 语言模型(ChatGPT)为常见手部手术查询提供了通常高质量的答案,但在没有引用来源材料的情况下,不清楚答案来自何处或何时提供。此外,需要较高的阅读水平才能理解所呈现的信息。AI 软件反复提到需要与外科医生讨论这些问题,强调共同决策和个性化护理的重要性,并遵守外科医生的治疗建议。
随着新型 AI 应用日益普及,手外科医生必须了解这些技术对患者护理的局限性和影响。