Gupta Suhasini, Haislup Brett D, Hoffman Ryan A, Murthi Anand M
T.H. Chan School of Medicine University of Massachusetts Worcester Massachusetts USA.
Department of Shoulder and Elbow Surgery MedStar Union Memorial Hospital Baltimore Maryland USA.
J Exp Orthop. 2025 May 19;12(2):e70281. doi: 10.1002/jeo2.70281. eCollection 2025 Apr.
The purpose of this study was to analyze the quality, accuracy, reliability and readability of information provided by an Artificial Intelligence (AI) model ChatGPT (Open AI, San Francisco) regarding Distal Biceps Tendon repair surgery.
ChatGPT 3.5 was used to answer 27 commonly asked questions regarding 'distal biceps repair surgery' by patients. These questions were categorized using the Rothwell criteria of and . The answers generated by ChatGPT were analyzed using the DISCERN scale, (JAMA) benchmark criteria, Flesch-Kincaid Reading Ease Score (FRES) and grade Level (FKGL).
The DISCERN score for -based questions was 59, was 61 and was 59 (all considered 'good scores'). The JAMA benchmark criteria were 0, representing the lowest score, for all three categories of and . The FRES score for the questions was 24.49, was 22.82, was 21.77 and the FKGL score for was 14.96, was 14.78 and was 15.00.
The answers provided by ChatGPT were a 'good' source in terms of quality assessment, compared to other online resources that do not have citations as an option. The accuracy and reliability of these answers were shown to be low, with nearly a college-graduate level of readability. This indicates that physicians should caution patients when searching ChatGPT for information regarding distal biceps repairs. ChatGPT serves as a promising source for patients to learn about their procedure, although its reliability and readability are disadvantages for the average patient when utilizing the software.
本研究旨在分析人工智能(AI)模型ChatGPT(OpenAI,旧金山)提供的关于肱二头肌远端肌腱修复手术信息的质量、准确性、可靠性和可读性。
使用ChatGPT 3.5回答患者关于“肱二头肌远端修复手术”的27个常见问题。这些问题根据Rothwell标准进行分类。使用DISCERN量表、(《美国医学会杂志》)基准标准、弗莱什-金凯德阅读简易度得分(FRES)和年级水平(FKGL)对ChatGPT生成的答案进行分析。
基于[具体内容缺失]的问题的DISCERN得分为59,[具体内容缺失]为61,[具体内容缺失]为59(均被视为“高分”)。对于[具体内容缺失]的所有三个类别,《美国医学会杂志》基准标准均为0,代表最低分数。[具体内容缺失]问题的FRES得分为24.49,[具体内容缺失]为22.82,[具体内容缺失]为21.77,[具体内容缺失]的FKGL得分为14.96,[具体内容缺失]为14.78,[具体内容缺失]为15.00。
与其他没有引用选项的在线资源相比,ChatGPT提供的答案在质量评估方面是一个“好”的来源。这些答案的准确性和可靠性较低,可读性接近大学毕业生水平。这表明医生在患者通过ChatGPT搜索肱二头肌远端修复信息时应予以谨慎。ChatGPT是患者了解其手术过程的一个有前景的来源,尽管其可靠性和可读性在普通患者使用该软件时是不利因素。