Dolu Fatih, Ay Oğuzhan Fatih, Kupeli Aydın Hakan, Karademir Enes, Büyükavcı Muhammed Huseyin
Department of Surgical Oncology, Kahramanmaras Necip Fazıl City Hospital, Kahramanmaras, Turkey.
Department of General Surgery, Kahramanmaras Necip Fazıl City Hospital, Merkez, Erkenez Mh., Recep Tayyip Erdoğan Bulvarı 12. Km, Kahramanmaras, 46050, Turkey, 90 3442282800.
JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.
The integration of artificial intelligence (AI) into clinical workflows holds promise for enhancing outpatient decision-making and patient education. ChatGPT, a large language model developed by OpenAI, has gained attention for its potential to support both clinicians and patients. However, its performance in the outpatient setting of general surgery remains underexplored.
This study aimed to evaluate whether ChatGPT-4 can function as a virtual outpatient assistant in the management of puerperal mastitis by assessing the accuracy, clarity, and clinical safety of its responses to frequently asked patient questions in Turkish.
Fifteen questions about puerperal mastitis were sourced from public health care websites and online forums. These questions were categorized into general information (n=2), symptoms and diagnosis (n=6), treatment (n=2), and prognosis (n=5). Each question was entered into ChatGPT-4 (September 3, 2024), and a single Turkish-language response was obtained. The responses were evaluated by a panel consisting of 3 board-certified general surgeons and 2 general surgery residents, using five criteria: sufficient length, patient-understandable language, accuracy, adherence to current guidelines, and patient safety. Quantitative metrics included the DISCERN score, Flesch-Kincaid readability score, and inter-rater reliability assessed using the intraclass correlation coefficient (ICC).
A total of 15 questions were evaluated. ChatGPT's responses were rated as "excellent" overall by the evaluators, with higher scores observed for treatment- and prognosis-related questions. A statistically significant difference was found in DISCERN scores across question types (P=.01), with treatment and prognosis questions receiving higher ratings. In contrast, no significant differences were detected in evaluator-based ratings (sufficient length, understandability, accuracy, guideline compliance, and patient safety), JAMA benchmark scores, or Flesch-Kincaid readability levels (P>.05 for all). Interrater agreement was good across all evaluation parameters (ICC=0.772); however, agreement varied when assessed by individual criteria. Correlation analyses revealed no significant overall associations between subjective ratings and objective quality measures, although a strong positive correlation between literature compliance and patient safety was identified for one question (r=0.968, P<.001).
ChatGPT demonstrated adequate capability in providing information on puerperal mastitis, particularly for treatment and prognosis. However, evaluator variability and the subjective nature of assessments highlight the need for further optimization of AI tools. Future research should emphasize iterative questioning and dynamic updates to AI knowledge bases to enhance reliability and accessibility.
将人工智能(AI)整合到临床工作流程中有望改善门诊决策和患者教育。ChatGPT是OpenAI开发的一种大型语言模型,因其支持临床医生和患者的潜力而受到关注。然而,其在普通外科门诊环境中的表现仍未得到充分探索。
本研究旨在通过评估ChatGPT-4对土耳其语患者常见问题的回答的准确性、清晰度和临床安全性,来评估其是否能作为产褥期乳腺炎管理中的虚拟门诊助手。
从公共卫生保健网站和在线论坛收集了15个关于产褥期乳腺炎的问题。这些问题分为一般信息(n=2)、症状与诊断(n=6)、治疗(n=2)和预后(n=5)。每个问题输入ChatGPT-4(2024年9月3日),并获得一个土耳其语回答。由3名获得委员会认证的普通外科医生和2名普通外科住院医师组成的小组使用五个标准对回答进行评估:长度足够、患者易懂的语言、准确性、符合当前指南以及患者安全性。定量指标包括辨别分数、弗莱什-金凯德可读性分数以及使用组内相关系数(ICC)评估的评分者间信度。
共评估了15个问题。评估者对ChatGPT的回答总体评价为“优秀”,与治疗和预后相关的问题得分更高。不同类型问题的辨别分数存在统计学显著差异(P=0.01),治疗和预后问题的评分更高。相比之下,基于评估者的评分(长度足够、易懂性、准确性、指南依从性和患者安全性)、《美国医学会杂志》基准分数或弗莱什-金凯德可读性水平均未检测到显著差异(所有P>0.05)。所有评估参数的评分者间一致性良好(ICC=0.772);然而,按个别标准评估时一致性有所不同。相关性分析显示主观评分与客观质量指标之间总体无显著关联,尽管有一个问题的文献依从性与患者安全性之间存在强正相关(r=0.968,P<0.001)。
ChatGPT在提供产褥期乳腺炎信息方面表现出足够的能力,特别是在治疗和预后方面。然而,评估者的变异性和评估的主观性凸显了进一步优化人工智能工具的必要性。未来的研究应强调迭代提问和对人工智能知识库的动态更新,以提高可靠性和可及性。