Sarikonda Advith, Abishek Robert, Isch Emily L, Momin Arbaz A, Self Mitchell, Sambangi Abhijeet, Carreras Angeleah, Jallo Jack, Harrop Jim, Sivaganesan Ahilan
Department of Neurological Surgery, Thomas Jefferson University, Philadelphia, USA.
Department of General Surgery, Division of Plastic Surgery, Thomas Jefferson University Hospital, Philadelphia, USA.
Cureus. 2024 Oct 8;16(10):e71105. doi: 10.7759/cureus.71105. eCollection 2024 Oct.
Introduction Minimally invasive spine surgery (MISS) has evolved over the last three decades as a less invasive alternative to traditional spine surgery, offering benefits such as smaller incisions, faster recovery, and lower complication rates. With patients frequently seeking information about MISS online, the comprehensibility and accuracy of this information are crucial. Recent studies have shown that much of the online material regarding spine surgery exceeds the recommended readability levels, making it difficult for patients to understand. This study explores the clinical appropriateness and readability of responses generated by Chat Generative Pre-Trained Transformer (ChatGPT) to frequently asked questions (FAQs) about MISS. Methods A set of 15 FAQs was formulated based on clinical expertise and existing literature on MISS. Each question was independently inputted into ChatGPT five times, and the generated responses were evaluated by three neurosurgery attendings for clinical appropriateness. Appropriateness was judged based on accuracy, readability, and patient accessibility. Readability was assessed using seven standardized readability tests, including the Flesch-Kincaid Grade Level and Flesch Reading Ease (FRE) scores. Statistical analysis was performed to compare readability scores across preoperative, postoperative, and intraoperative/technical question categories. Results The mean readability scores for preoperative, postoperative, and intraoperative/technical questions were 15±2.8, 16±3, and 15.7±3.2, respectively, significantly exceeding the recommended sixth- to eighth-grade reading level for patient education (p=0.017). Differences in readability across individual questions were also statistically significant (p<0.001). All responses required a reading level above 11th grade, with a majority indicating college-level comprehension. Although preoperative and postoperative questions generally elicited clinically appropriate responses, 50% of intraoperative/technical questions yielded either "inappropriate" or "unreliable" responses, particularly for inquiries about radiation exposure and the use of lasers in MISS. Conclusions While ChatGPT is proficient in providing clinically appropriate responses to certain FAQs about MISS, it frequently produces responses that exceed the recommended readability level for patient education. This limitation suggests that its utility may be confined to highly educated patients, potentially exacerbating existing disparities in patient comprehension. Future AI-based patient education tools must prioritize clear and accessible communication, with oversight from medical professionals to ensure accuracy and appropriateness. Further research comparing ChatGPT's performance with other AI models could enhance its application in patient education across medical specialties.
引言 微创脊柱手术(MISS)在过去三十年中不断发展,成为传统脊柱手术的一种侵入性较小的替代方案,具有切口小、恢复快和并发症发生率低等优点。随着患者经常在网上寻求有关MISS的信息,这些信息的可理解性和准确性至关重要。最近的研究表明,许多关于脊柱手术的在线材料超出了推荐的可读性水平,使患者难以理解。本研究探讨了Chat生成式预训练变换器(ChatGPT)对有关MISS的常见问题(FAQ)所生成回答的临床适用性和可读性。方法 基于临床专业知识和现有的关于MISS的文献,制定了一组15个常见问题。每个问题独立输入ChatGPT五次,生成的回答由三名神经外科主治医师评估其临床适用性。适用性根据准确性、可读性和患者可及性进行判断。使用七种标准化可读性测试评估可读性,包括弗莱施-金凯德年级水平和弗莱施阅读简易度(FRE)分数。进行统计分析以比较术前、术后和术中/技术问题类别的可读性分数。结果 术前、术后和术中/技术问题的平均可读性分数分别为15±2.8、16±3和15.7±3.2,显著超过了推荐的用于患者教育的六年级至八年级阅读水平(p = 0.017)。各个问题的可读性差异也具有统计学意义(p < 0.001)。所有回答都要求阅读水平高于十一年级,大多数表明需要大学水平的理解能力。虽然术前和术后问题通常能引出临床适用的回答,但50%的术中/技术问题产生了“不适用”或“不可靠”的回答,特别是关于MISS中辐射暴露和激光使用的询问。结论 虽然ChatGPT能够熟练地为某些有关MISS的常见问题提供临床适用的回答,但它经常产生超出推荐的用于患者教育的可读性水平的回答。这一局限性表明其效用可能仅限于受过高等教育的患者,可能会加剧患者理解方面现有的差异。未来基于人工智能的患者教育工具必须优先考虑清晰易懂的沟通,并由医学专业人员进行监督以确保准确性和适用性。比较ChatGPT与其他人工智能模型性能的进一步研究可以增强其在各医学专科患者教育中的应用。