比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的子宫切除术后问题的回答。

Comparing physician and artificial intelligence chatbot responses to posthysterectomy questions posted to a public social media forum.

作者信息

Beale Shadae K, Cohen Natalie, Secheli Beatrice, McIntire Donald, Kho Kimberly A

机构信息

Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX (Beale, Cohen, Secheli, and McIntire).

Department of Obstetrics, Gynecology & Women's Health, University of Hawaii, Honolulu, HI (Kho).

出版信息

AJOG Glob Rep. 2025 Aug 5;5(3):100553. doi: 10.1016/j.xagr.2025.100553. eCollection 2025 Aug.

DOI:10.1016/j.xagr.2025.100553

PMID:40917303

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12410437/

Abstract

BACKGROUND

Within public online forums, patients often seek reassurance and guidance from the community regarding postoperative symptoms and expectations, and when to seek medical assistance. Others are using artificial intelligence in the form of online search engines or chatbots such as ChatGPT or Perplexity. Artificial intelligence chatbot assistants have been growing in popularity; however, clinicians may be hesitant to use them because of concerns about accuracy. The online networking service for medical professionals, Doximity, has expanded its resources to include a Health Insurance Portability and Accountability Act-compliant artificial intelligence writing assistant, Doximity GPT, designed to reduce the administrative burden on clinicians. Health professionals learn using a "medical model," which greatly differs from the "health belief model" that laypeople learn through. This mismatch in learning perspectives likely contributes to a communication mismatch even during digital clinician-patient encounters, especially in patients with limited health literacy during the perioperative period when complications may arise.

OBJECTIVE

This study aimed to evaluate the ability of artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) to generate quality, accurate, and empathetic responses to postoperative patient queries that are also understandable and actionable.

STUDY DESIGN

Responses to 10 postoperative queries sourced from HysterSisters, a public forum for "woman-to-woman hysterectomy support," were generated using 3 artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) and a minimally invasive gynecologic surgery fellowship-trained surgeon. Ten physician evaluators compared the blinded responses for quality, accuracy, and empathy. A separate pair of physician evaluators scored the responses for understandability and actionability using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P). The final scores were the average of both reviewers' scores. Analysis of variance was used for pairwise comparison of the evaluator scores between sources. Lastly, the Kruskal-Wallis test was used to analyze Flesch-Kincaid scoring for readability. The Pearson chi-square test was used to demonstrate the difference in reading level among the responses for each source.

RESULTS

Compared with a physician, Doximity GPT and ChatGPT were rated as more empathetic than a minimally invasive gynecologic surgeon, but quality and accuracy were similar across these sources. There was a significant difference between Perplexity and the other response sources, favoring the latter, for quality and accuracy (<.001). Perplexity and the minimally invasive gynecologic surgeon ranked similarly for empathy. Reading ease was greater for the minimally invasive gynecologic surgeon responses (60.6 [53.5-68.4]; eighth and ninth grade) than for Perplexity (40.0 [28.6-47.2], college) and ChatGPT (35.5 [28.2-42.0], college) (<.01). There was no significant difference in understandability and actionability, with all sources scored as having good understandability and average actionability.

CONCLUSION

As artificial intelligence chatbot assistants grow in popularity, including integration in the electronic health record, the output's readability must reflect the general population's health literacy to be impactful and effective. This analysis serves as a reminder for physicians to be mindful of this mismatch in readability and general health literacy when considering the integration of artificial intelligence chatbot assistants into patient care. The accuracy and consistency of these chatbots may also impact patient outcomes, making screening of utmost importance in this endeavor.

摘要

背景

在公共在线论坛中，患者经常就术后症状、期望以及何时寻求医疗帮助等问题向社区寻求安慰和指导。其他人则以在线搜索引擎或ChatGPT或Perplexity等聊天机器人的形式使用人工智能。人工智能聊天机器人助手越来越受欢迎；然而，临床医生可能会因担心准确性而对使用它们犹豫不决。面向医疗专业人员的在线网络服务Doximity已扩展其资源，包括一个符合《健康保险流通与责任法案》的人工智能写作助手Doximity GPT，旨在减轻临床医生的行政负担。医疗专业人员通过“医学模式”学习，这与普通大众通过“健康信念模式”学习有很大不同。这种学习视角的不匹配可能导致即使在数字临床医患互动期间也存在沟通不匹配，尤其是在围手术期健康素养有限且可能出现并发症的患者中。

目的

本研究旨在评估人工智能聊天机器人助手（Doximity GPT、Perplexity和ChatGPT）对术后患者问题生成高质量、准确且有同理心的回复的能力，这些回复还应易于理解且具有可操作性。

研究设计

使用3个人工智能聊天机器人助手（Doximity GPT、Perplexity和ChatGPT）以及一位接受过微创妇科手术 fellowship培训的外科医生，对来自HysterSisters（一个“女性对女性子宫切除术支持”的公共论坛）的10个术后问题生成回复。10位医生评估者对这些盲态回复的质量、准确性和同理心进行比较。另一组医生评估者使用《印刷材料患者教育材料评估工具》（PEMAT-P）对回复的易懂性和可操作性进行评分。最终分数是两位评估者分数的平均值。方差分析用于对不同来源的评估者分数进行两两比较。最后，使用Kruskal-Wallis检验分析Flesch-Kincaid可读性评分。Pearson卡方检验用于证明每个来源回复的阅读水平差异。

结果

与医生相比，Doximity GPT和ChatGPT在同理心方面的评分高于微创妇科外科医生，但在质量和准确性方面，这些来源相似。在质量和准确性方面，Perplexity与其他回复来源之间存在显著差异，后者更具优势（<.001）。Perplexity和微创妇科外科医生在同理心方面排名相似。微创妇科外科医生的回复阅读简易性更高（60.6 [53.5 - 68.4]；八年级和九年级），高于Perplexity（40.0 [28.6 - 47.2]，大学水平）和ChatGPT（35.5 [28.2 - 42.0]，大学水平）（<.01）。在易懂性和可操作性方面没有显著差异，所有来源的评分均为具有良好的易懂性和平均可操作性。

结论

随着人工智能聊天机器人助手越来越受欢迎，包括集成到电子健康记录中，输出内容的可读性必须反映普通人群的健康素养，才能产生影响并发挥作用。该分析提醒医生在考虑将人工智能聊天机器人助手整合到患者护理中时，要注意这种可读性与一般健康素养之间的不匹配。这些聊天机器人的准确性和一致性也可能影响患者的治疗结果，因此在这项工作中进行筛选至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd5b/12410437/f9a5b7bf01ac/gr5.jpg

相似文献

Comparing physician and artificial intelligence chatbot responses to posthysterectomy questions posted to a public social media forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的子宫切除术后问题的回答。

AJOG Glob Rep. 2025 Aug 5;5(3):100553. doi: 10.1016/j.xagr.2025.100553. eCollection 2025 Aug.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Evaluating the Quality of Cardiovascular Disease Information From AI Chatbots: A Comparative Study.评估人工智能聊天机器人提供的心血管疾病信息质量：一项比较研究。

Cureus. 2025 Jul 16;17(7):e88085. doi: 10.7759/cureus.88085. eCollection 2025 Jul.

Evaluating the role of AI chatbots in patient education for abdominal aortic aneurysms: a comparison of ChatGPT and conventional resources.评估人工智能聊天机器人在腹主动脉瘤患者教育中的作用：ChatGPT与传统资源的比较

ANZ J Surg. 2025 Apr;95(4):784-788. doi: 10.1111/ans.70053. Epub 2025 Mar 5.

Sexual Harassment and Prevention Training性骚扰与预防培训

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果：一项观察性研究的内容分析

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人：可靠的生命线还是风险？

Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Using Artificial Intelligence ChatGPT to Access Medical Information About Chemical Eye Injuries: Comparative Study.使用人工智能ChatGPT获取有关化学性眼外伤的医学信息：比较研究

JMIR Form Res. 2025 Aug 13;9:e73642. doi: 10.2196/73642.

本文引用的文献

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生：通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性，探索人工智能在眼科领域的应用。

Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.

ChatGPT compared with Google Search and healthcare institution as sources of postoperative patient instructions after gynecological surgery.将ChatGPT与谷歌搜索及医疗机构作为妇科手术后患者术后指导信息来源进行比较。

BJOG. 2024 Jul;131(8):1154-1156. doi: 10.1111/1471-0528.17746. Epub 2024 Jan 4.

AI Chatbots, Health Privacy, and Challenges to HIPAA Compliance.人工智能聊天机器人、健康隐私与《健康保险流通与责任法案》合规面临的挑战

JAMA. 2023 Jul 25;330(4):309-310. doi: 10.1001/jama.2023.9458.

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

A Comprehensive Review of the Role of Artificial Intelligence in Obstetrics and Gynecology.人工智能在妇产科领域作用的全面综述

Cureus. 2023 Feb 12;15(2):e34891. doi: 10.7759/cureus.34891. eCollection 2023 Feb.

Contributions of Artificial Intelligence Reported in Obstetrics and Gynecology Journals: Systematic Review.人工智能在妇产科期刊中的应用：系统评价。

J Med Internet Res. 2022 Apr 20;24(4):e35465. doi: 10.2196/35465.

Predicting the readability of physicians' secure messages to improve health communication using novel linguistic features: Findings from the ECLIPPSE study.利用新颖语言特征预测医生安全信息的可读性以改善健康沟通：ECLIPPSE研究的结果

J Commun Healthc. 2020;13(4):1-13. doi: 10.1080/17538068.2020.1822726. Epub 2020 Sep 24.

Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information.患者教育材料评估工具（PEMAT）的开发：一种针对印刷和视听患者信息的可理解性和可操作性的新测量方法。

Patient Educ Couns. 2014 Sep;96(3):395-403. doi: 10.1016/j.pec.2014.05.027. Epub 2014 Jun 12.

The literacy divide: health literacy and the use of an internet-based patient portal in an integrated health system-results from the diabetes study of northern California (DISTANCE).读写能力差距：在综合卫生系统中，健康素养和互联网患者门户的使用——来自加利福尼亚北部的糖尿病研究（DISTANCE）的结果。

J Health Commun. 2010;15 Suppl 2(Suppl 2):183-96. doi: 10.1080/10810730.2010.499988.

Language, literacy, and characterization of stroke among patients taking warfarin for stroke prevention: Implications for health communication.语言、读写能力与服用华法林预防中风患者的中风特征：对健康传播的启示。

Patient Educ Couns. 2009 Jun;75(3):403-10. doi: 10.1016/j.pec.2008.12.009. Epub 2009 Jan 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的子宫切除术后问题的回答。

Comparing physician and artificial intelligence chatbot responses to posthysterectomy questions posted to a public social media forum.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

STUDY DESIGN

RESULTS

CONCLUSION

背景

目的

研究设计

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献