评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果：一项观察性研究的内容分析

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.

作者信息

Dolu Fatih, Ay Oğuzhan Fatih, Kupeli Aydın Hakan, Karademir Enes, Büyükavcı Muhammed Huseyin

机构信息

Department of Surgical Oncology, Kahramanmaras Necip Fazıl City Hospital, Kahramanmaras, Turkey.

Department of General Surgery, Kahramanmaras Necip Fazıl City Hospital, Merkez, Erkenez Mh., Recep Tayyip Erdoğan Bulvarı 12. Km, Kahramanmaras, 46050, Turkey, 90 3442282800.

出版信息

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

DOI:10.2196/68980

PMID:40705609

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12288767/

Abstract

BACKGROUND

The integration of artificial intelligence (AI) into clinical workflows holds promise for enhancing outpatient decision-making and patient education. ChatGPT, a large language model developed by OpenAI, has gained attention for its potential to support both clinicians and patients. However, its performance in the outpatient setting of general surgery remains underexplored.

OBJECTIVE

This study aimed to evaluate whether ChatGPT-4 can function as a virtual outpatient assistant in the management of puerperal mastitis by assessing the accuracy, clarity, and clinical safety of its responses to frequently asked patient questions in Turkish.

METHODS

Fifteen questions about puerperal mastitis were sourced from public health care websites and online forums. These questions were categorized into general information (n=2), symptoms and diagnosis (n=6), treatment (n=2), and prognosis (n=5). Each question was entered into ChatGPT-4 (September 3, 2024), and a single Turkish-language response was obtained. The responses were evaluated by a panel consisting of 3 board-certified general surgeons and 2 general surgery residents, using five criteria: sufficient length, patient-understandable language, accuracy, adherence to current guidelines, and patient safety. Quantitative metrics included the DISCERN score, Flesch-Kincaid readability score, and inter-rater reliability assessed using the intraclass correlation coefficient (ICC).

RESULTS

A total of 15 questions were evaluated. ChatGPT's responses were rated as "excellent" overall by the evaluators, with higher scores observed for treatment- and prognosis-related questions. A statistically significant difference was found in DISCERN scores across question types (P=.01), with treatment and prognosis questions receiving higher ratings. In contrast, no significant differences were detected in evaluator-based ratings (sufficient length, understandability, accuracy, guideline compliance, and patient safety), JAMA benchmark scores, or Flesch-Kincaid readability levels (P>.05 for all). Interrater agreement was good across all evaluation parameters (ICC=0.772); however, agreement varied when assessed by individual criteria. Correlation analyses revealed no significant overall associations between subjective ratings and objective quality measures, although a strong positive correlation between literature compliance and patient safety was identified for one question (r=0.968, P<.001).

CONCLUSIONS

ChatGPT demonstrated adequate capability in providing information on puerperal mastitis, particularly for treatment and prognosis. However, evaluator variability and the subjective nature of assessments highlight the need for further optimization of AI tools. Future research should emphasize iterative questioning and dynamic updates to AI knowledge bases to enhance reliability and accessibility.

摘要

背景

将人工智能（AI）整合到临床工作流程中有望改善门诊决策和患者教育。ChatGPT是OpenAI开发的一种大型语言模型，因其支持临床医生和患者的潜力而受到关注。然而，其在普通外科门诊环境中的表现仍未得到充分探索。

目的

本研究旨在通过评估ChatGPT-4对土耳其语患者常见问题的回答的准确性、清晰度和临床安全性，来评估其是否能作为产褥期乳腺炎管理中的虚拟门诊助手。

方法

从公共卫生保健网站和在线论坛收集了15个关于产褥期乳腺炎的问题。这些问题分为一般信息（n=2）、症状与诊断（n=6）、治疗（n=2）和预后（n=5）。每个问题输入ChatGPT-4（2024年9月3日），并获得一个土耳其语回答。由3名获得委员会认证的普通外科医生和2名普通外科住院医师组成的小组使用五个标准对回答进行评估：长度足够、患者易懂的语言、准确性、符合当前指南以及患者安全性。定量指标包括辨别分数、弗莱什-金凯德可读性分数以及使用组内相关系数（ICC）评估的评分者间信度。

结果

共评估了15个问题。评估者对ChatGPT的回答总体评价为“优秀”，与治疗和预后相关的问题得分更高。不同类型问题的辨别分数存在统计学显著差异（P=0.01），治疗和预后问题的评分更高。相比之下，基于评估者的评分（长度足够、易懂性、准确性、指南依从性和患者安全性）、《美国医学会杂志》基准分数或弗莱什-金凯德可读性水平均未检测到显著差异（所有P>0.05）。所有评估参数的评分者间一致性良好（ICC=0.772）；然而，按个别标准评估时一致性有所不同。相关性分析显示主观评分与客观质量指标之间总体无显著关联，尽管有一个问题的文献依从性与患者安全性之间存在强正相关（r=0.968，P<0.001）。

结论

ChatGPT在提供产褥期乳腺炎信息方面表现出足够的能力，特别是在治疗和预后方面。然而，评估者的变异性和评估的主观性凸显了进一步优化人工智能工具的必要性。未来的研究应强调迭代提问和对人工智能知识库的动态更新，以提高可靠性和可及性。

相似文献

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果：一项观察性研究的内容分析

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

Using Artificial Intelligence ChatGPT to Access Medical Information about Chemical Eye Injuries: A Comparative Study.使用人工智能ChatGPT获取有关化学性眼外伤的医学信息：一项比较研究。

JMIR Form Res. 2025 Jun 30. doi: 10.2196/73642.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Evaluating DeepResearch and DeepThink in anterior cruciate ligament surgery patient education: ChatGPT-4o excels in comprehensiveness, DeepSeek R1 leads in clarity and readability of orthopaedic information.评估DeepResearch和DeepThink在前交叉韧带手术患者教育中的作用：ChatGPT-4o在全面性方面表现出色，DeepSeek R1在骨科信息的清晰度和可读性方面领先。

Knee Surg Sports Traumatol Arthrosc. 2025 Jun 1. doi: 10.1002/ksa.12711.

The Reliability Gap: How Traditional Search Engines Outperform Artificial Intelligence (AI) Chatbots in Rosacea Public Health Information Quality.可靠性差距：传统搜索引擎在酒渣鼻公共卫生信息质量方面如何优于人工智能（AI）聊天机器人。

Cureus. 2025 Jun 22;17(6):e86543. doi: 10.7759/cureus.86543. eCollection 2025 Jun.

How accurate are ChatGPT-4 responses in chronic urticaria? A critical analysis with information quality metrics.ChatGPT-4对慢性荨麻疹的回答有多准确？基于信息质量指标的批判性分析。

World Allergy Organ J. 2025 Jun 14;18(7):101071. doi: 10.1016/j.waojou.2025.101071. eCollection 2025 Jul.

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平？

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

Can Artificial Intelligence Improve the Readability of Patient Education Materials?人工智能能否提高患者教育材料的可读性？

Clin Orthop Relat Res. 2023 Nov 1;481(11):2260-2267. doi: 10.1097/CORR.0000000000002668. Epub 2023 Apr 28.

[Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation].五种大语言模型在口腔辅助诊断、治疗及健康咨询领域的应用初探

Zhonghua Kou Qiang Yi Xue Za Zhi. 2025 Jul 30;60(8):871-878. doi: 10.3760/cma.j.cn112144-20241107-00418.

Can ChatGPT provide parent education for oral immunotherapy?ChatGPT能为口服免疫疗法提供家长教育吗？

Ann Allergy Asthma Immunol. 2025 Jul;135(1):87-90. doi: 10.1016/j.anai.2025.04.011. Epub 2025 Apr 24.

本文引用的文献

Evaluation of ChatGPT's performance in providing treatment recommendations for pediatric diseases.评估ChatGPT在提供儿科疾病治疗建议方面的表现。

Pediatr Discov. 2023 Nov 20;1(3):e42. doi: 10.1002/pdi3.42. eCollection 2023 Dec.

Performance of ChatGPT in providing patient information about upper tract urothelial carcinoma.ChatGPT在提供上尿路尿路上皮癌患者信息方面的表现。

Contemp Oncol (Pozn). 2024;28(2):172-181. doi: 10.5114/wo.2024.141567. Epub 2024 Aug 23.

Quality of ChatGPT-Generated Therapy Recommendations for Breast Cancer Treatment in Gynecology.ChatGPT 生成的乳腺癌治疗中妇科治疗建议的质量。

Curr Oncol. 2024 Jul 1;31(7):3845-3854. doi: 10.3390/curroncol31070284.

Exploring the Role of ChatGPT in Cardiology: A Systematic Review of the Current Literature.探索ChatGPT在心脏病学中的作用：当前文献的系统综述

Cureus. 2024 Apr 24;16(4):e58936. doi: 10.7759/cureus.58936. eCollection 2024 Apr.

General practitioners' management of mastitis in breastfeeding women: a mixed method study in Australia.全科医生对哺乳期乳腺炎的管理：澳大利亚的一项混合方法研究。

BMC Prim Care. 2024 May 10;25(1):161. doi: 10.1186/s12875-024-02414-4.

Puerperal mastitis caused by limited community-associated methicillin-resistant (CA-MRSA) clones.由有限的社区相关性耐甲氧西林金黄色葡萄球菌（CA-MRSA）克隆株引起的产褥期乳腺炎。

Front Med (Lausanne). 2024 Apr 19;11:1378207. doi: 10.3389/fmed.2024.1378207. eCollection 2024.

Artificial Intelligence and ChatGPT in Abdominopelvic Surgery: A Systematic Review of Applications and Impact.人工智能和 ChatGPT 在腹盆腔手术中的应用及影响的系统评价

In Vivo. 2024 May-Jun;38(3):1009-1015. doi: 10.21873/invivo.13534.

Empowering inclusivity: improving readability of living kidney donation information with ChatGPT.增强包容性：利用ChatGPT提高活体肾捐赠信息的可读性。

Front Digit Health. 2024 Apr 10;6:1366967. doi: 10.3389/fdgth.2024.1366967. eCollection 2024.

Chat Generative Pretraining Transformer Answers Patient-focused Questions in Cervical Spine Surgery.ChatGPT 生成式预训练转换器可回答颈椎手术患者关注的问题。

Clin Spine Surg. 2024 Jul 1;37(6):E278-E281. doi: 10.1097/BSD.0000000000001600. Epub 2024 Mar 21.

Ethics for AI in Plastic Surgery: Guidelines and Review.人工智能在整形外科学中的伦理学：指南与综述。

Aesthetic Plast Surg. 2024 Jun;48(11):2204-2209. doi: 10.1007/s00266-024-03932-3. Epub 2024 Mar 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验