Espinal Emil, Jurayj Alexander, Nerys-Figueroa Julio, Gaudiani Michael A, Baes Travis, Mahylis Jared, Muh Stephanie
Department of Orthopaedic Surgery, Henry Ford Hospital, Detroit, Michigan, USA.
Iowa Orthop J. 2025;45(1):19-32.
As online medical resources become more accessible, patients increasingly consult AI platforms like ChatGPT for health-related information. Our study assessed the accuracy and appropriateness of ChatGPT's responses to common questions about lateral epicondylitis, comparing them against OrthoInfo as a gold standard.
Eight frequently asked questions about lateral epicondylitis from OrthoInfo were selected and presented to ChatGPT at both standard and sixth-grade reading levels. Responses were evaluated for accuracy and appropriateness using a five-point Likert scale, with scores of four or above deemed satisfactory. Evaluations were conducted by two fellowship-trained Shoulder and Elbow surgeons, two Hand surgeons, and one Orthopaedic Sports fellow. We utilized the Flesch-Kincaid test to assess readability, and responses were statistically analyzed using paired t-tests.
ChatGPT's responses at the sixth-grade level scored lower in accuracy (mean = 3.9 ± 0.87, p = 0.046) and appropriateness (mean = 3.7 ± 0.92, p = 0.045) compared to the standard level (accuracy = 4.7 ± 0.43, appropriateness = 4.7 ± 0.45). When compared with OrthoInfo, standard responses from ChatGPT showed significantly lower accuracy (mean difference = -0.275, p = 0.004) and appropriateness (mean difference = -0.475, p = 0.016). The Flesch-Kincaid grade level was significantly higher in the standard response group (mean = 14.06, p < 0.001) compared to both OrthoInfo (mean = 8.98) and the sixth-grade responses (mean = 8.48). No significance was noted between the Flesch-Kincaid grades of OrthoInfo and the sixth-grade responses.
At a sixth-grade reading level, Chat-GPT provides oversimplified and less accurate information regarding lateral epicondylitis. Although standard level responses are more accurate, they still do not meet the reliability of OrthoInfo and exceed the recommended readability for patient education materials. While ChatGPT cannot be recommended as a sole information source, it may serve as a supplementary resource alongside professional medical consultation. .
随着在线医疗资源越来越容易获取,患者越来越多地向ChatGPT等人工智能平台咨询健康相关信息。我们的研究评估了ChatGPT对关于外侧上髁炎常见问题的回答的准确性和恰当性,并将其与作为金标准的OrthoInfo进行比较。
从OrthoInfo中选取了八个关于外侧上髁炎的常见问题,并以标准阅读水平和六年级阅读水平呈现给ChatGPT。使用五点李克特量表对回答的准确性和恰当性进行评估,得分在四分及以上被视为满意。评估由两名接受过专科培训的肩肘外科医生、两名手外科医生和一名骨科运动专科医生进行。我们使用弗莱施-金凯德测试来评估可读性,并使用配对t检验对回答进行统计分析。
与标准水平相比,ChatGPT六年级水平的回答在准确性(平均值=3.9±0.87,p=0.046)和恰当性(平均值=3.7±0.92,p=0.045)方面得分较低(标准水平的准确性=4.7±0.43,恰当性=4.7±0.45)。与OrthoInfo相比,ChatGPT的标准回答在准确性(平均差异=-0.275,p=0.004)和恰当性(平均差异=-0.475,p=0.016)方面显著较低。标准回答组的弗莱施-金凯德年级水平显著高于OrthoInfo(平均值=8.98)和六年级回答(平均值=8.48)(平均值=14.06,p<0.001)。OrthoInfo和六年级回答的弗莱施-金凯德年级之间未发现显著差异。
在六年级阅读水平下,ChatGPT提供的关于外侧上髁炎的信息过于简化且准确性较低。虽然标准水平的回答更准确,但它们仍未达到OrthoInfo的可靠性,并且超出了患者教育材料推荐的可读性。虽然不能推荐ChatGPT作为唯一的信息来源,但它可以作为专业医疗咨询的补充资源。