Duke University, Durham, North Carolina, U.S.A..
Duke University, Durham, North Carolina, U.S.A.
Arthroscopy. 2024 Mar;40(3):726-731.e6. doi: 10.1016/j.arthro.2023.07.048. Epub 2023 Aug 9.
To analyze the quality and readability of information regarding shoulder stabilization surgery available using an online AI software (ChatGPT), using standardized scoring systems, as well as to report on the given answers by the AI.
An open AI model (ChatGPT) was used to answer 23 commonly asked questions from patients on shoulder stabilization surgery. These answers were evaluated for medical accuracy, quality, and readability using The JAMA Benchmark criteria, DISCERN score, Flesch-Kincaid Reading Ease Score (FRES) & Grade Level (FKGL).
The JAMA Benchmark criteria score was 0, which is the lowest score, indicating no reliable resources cited. The DISCERN score was 60, which is considered a good score. The areas that open AI model did not achieve full marks were also related to the lack of available source material used to compile the answers, and finally some shortcomings with information not fully supported by the literature. The FRES was 26.2, and the FKGL was considered to be that of a college graduate.
There was generally high quality in the answers given on questions relating to shoulder stabilization surgery, but there was a high reading level required to comprehend the information presented. However, it is unclear where the answers came from with no source material cited. It is important to note that the ChatGPT software repeatedly references the need to discuss these questions with an orthopaedic surgeon and the importance of shared discussion making, as well as compliance with surgeon treatment recommendations.
As shoulder instability is an injury that predominantly affects younger individuals who may use the Internet for information, this study shows what information patients may be getting online.
使用标准化评分系统分析在线人工智能软件(ChatGPT)中有关肩部稳定手术的信息的质量和可读性,并报告人工智能的回答。
使用开放式人工智能模型(ChatGPT)回答了 23 个关于肩部稳定手术的常见患者问题。使用 JAMA 基准标准、DISCERN 评分、Flesch-Kincaid 阅读舒适度得分(FRES)和等级水平(FKGL)评估这些答案的医学准确性、质量和可读性。
JAMA 基准标准评分为 0,这是最低分,表明没有引用可靠的资源。DISCERN 评分为 60,这被认为是一个不错的分数。人工智能模型没有获得满分的领域也与用于编写答案的可用资料不足有关,最后,一些信息没有得到文献的充分支持。FRES 为 26.2,FKGL 被认为是大学毕业生的水平。
与肩部稳定手术相关的问题的回答总体质量较高,但理解所呈现信息的阅读水平要求较高。然而,不清楚答案来自何处,也没有引用任何资料来源。需要注意的是,ChatGPT 软件反复提到需要与骨科医生讨论这些问题,并强调共同讨论的重要性,以及遵守外科医生的治疗建议。
由于肩不稳定是一种主要影响年轻人的损伤,他们可能会在网上寻找信息,因此本研究展示了患者可能在网上获取哪些信息。