Suppr超能文献

比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的子宫切除术后问题的回答。

Comparing physician and artificial intelligence chatbot responses to posthysterectomy questions posted to a public social media forum.

作者信息

Beale Shadae K, Cohen Natalie, Secheli Beatrice, McIntire Donald, Kho Kimberly A

机构信息

Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX (Beale, Cohen, Secheli, and McIntire).

Department of Obstetrics, Gynecology & Women's Health, University of Hawaii, Honolulu, HI (Kho).

出版信息

AJOG Glob Rep. 2025 Aug 5;5(3):100553. doi: 10.1016/j.xagr.2025.100553. eCollection 2025 Aug.

Abstract

BACKGROUND

Within public online forums, patients often seek reassurance and guidance from the community regarding postoperative symptoms and expectations, and when to seek medical assistance. Others are using artificial intelligence in the form of online search engines or chatbots such as ChatGPT or Perplexity. Artificial intelligence chatbot assistants have been growing in popularity; however, clinicians may be hesitant to use them because of concerns about accuracy. The online networking service for medical professionals, Doximity, has expanded its resources to include a Health Insurance Portability and Accountability Act-compliant artificial intelligence writing assistant, Doximity GPT, designed to reduce the administrative burden on clinicians. Health professionals learn using a "medical model," which greatly differs from the "health belief model" that laypeople learn through. This mismatch in learning perspectives likely contributes to a communication mismatch even during digital clinician-patient encounters, especially in patients with limited health literacy during the perioperative period when complications may arise.

OBJECTIVE

This study aimed to evaluate the ability of artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) to generate quality, accurate, and empathetic responses to postoperative patient queries that are also understandable and actionable.

STUDY DESIGN

Responses to 10 postoperative queries sourced from HysterSisters, a public forum for "woman-to-woman hysterectomy support," were generated using 3 artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) and a minimally invasive gynecologic surgery fellowship-trained surgeon. Ten physician evaluators compared the blinded responses for quality, accuracy, and empathy. A separate pair of physician evaluators scored the responses for understandability and actionability using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P). The final scores were the average of both reviewers' scores. Analysis of variance was used for pairwise comparison of the evaluator scores between sources. Lastly, the Kruskal-Wallis test was used to analyze Flesch-Kincaid scoring for readability. The Pearson chi-square test was used to demonstrate the difference in reading level among the responses for each source.

RESULTS

Compared with a physician, Doximity GPT and ChatGPT were rated as more empathetic than a minimally invasive gynecologic surgeon, but quality and accuracy were similar across these sources. There was a significant difference between Perplexity and the other response sources, favoring the latter, for quality and accuracy (<.001). Perplexity and the minimally invasive gynecologic surgeon ranked similarly for empathy. Reading ease was greater for the minimally invasive gynecologic surgeon responses (60.6 [53.5-68.4]; eighth and ninth grade) than for Perplexity (40.0 [28.6-47.2], college) and ChatGPT (35.5 [28.2-42.0], college) (<.01). There was no significant difference in understandability and actionability, with all sources scored as having good understandability and average actionability.

CONCLUSION

As artificial intelligence chatbot assistants grow in popularity, including integration in the electronic health record, the output's readability must reflect the general population's health literacy to be impactful and effective. This analysis serves as a reminder for physicians to be mindful of this mismatch in readability and general health literacy when considering the integration of artificial intelligence chatbot assistants into patient care. The accuracy and consistency of these chatbots may also impact patient outcomes, making screening of utmost importance in this endeavor.

摘要

背景

在公共在线论坛中,患者经常就术后症状、期望以及何时寻求医疗帮助等问题向社区寻求安慰和指导。其他人则以在线搜索引擎或ChatGPT或Perplexity等聊天机器人的形式使用人工智能。人工智能聊天机器人助手越来越受欢迎;然而,临床医生可能会因担心准确性而对使用它们犹豫不决。面向医疗专业人员的在线网络服务Doximity已扩展其资源,包括一个符合《健康保险流通与责任法案》的人工智能写作助手Doximity GPT,旨在减轻临床医生的行政负担。医疗专业人员通过“医学模式”学习,这与普通大众通过“健康信念模式”学习有很大不同。这种学习视角的不匹配可能导致即使在数字临床医患互动期间也存在沟通不匹配,尤其是在围手术期健康素养有限且可能出现并发症的患者中。

目的

本研究旨在评估人工智能聊天机器人助手(Doximity GPT、Perplexity和ChatGPT)对术后患者问题生成高质量、准确且有同理心的回复的能力,这些回复还应易于理解且具有可操作性。

研究设计

使用3个人工智能聊天机器人助手(Doximity GPT、Perplexity和ChatGPT)以及一位接受过微创妇科手术 fellowship培训的外科医生,对来自HysterSisters(一个“女性对女性子宫切除术支持”的公共论坛)的10个术后问题生成回复。10位医生评估者对这些盲态回复的质量、准确性和同理心进行比较。另一组医生评估者使用《印刷材料患者教育材料评估工具》(PEMAT-P)对回复的易懂性和可操作性进行评分。最终分数是两位评估者分数的平均值。方差分析用于对不同来源的评估者分数进行两两比较。最后,使用Kruskal-Wallis检验分析Flesch-Kincaid可读性评分。Pearson卡方检验用于证明每个来源回复的阅读水平差异。

结果

与医生相比,Doximity GPT和ChatGPT在同理心方面的评分高于微创妇科外科医生,但在质量和准确性方面,这些来源相似。在质量和准确性方面,Perplexity与其他回复来源之间存在显著差异,后者更具优势(<.001)。Perplexity和微创妇科外科医生在同理心方面排名相似。微创妇科外科医生的回复阅读简易性更高(60.6 [53.5 - 68.4];八年级和九年级),高于Perplexity(40.0 [28.6 - 47.2],大学水平)和ChatGPT(35.5 [28.2 - 42.0],大学水平)(<.01)。在易懂性和可操作性方面没有显著差异,所有来源的评分均为具有良好的易懂性和平均可操作性。

结论

随着人工智能聊天机器人助手越来越受欢迎,包括集成到电子健康记录中,输出内容的可读性必须反映普通人群的健康素养,才能产生影响并发挥作用。该分析提醒医生在考虑将人工智能聊天机器人助手整合到患者护理中时,要注意这种可读性与一般健康素养之间的不匹配。这些聊天机器人的准确性和一致性也可能影响患者的治疗结果,因此在这项工作中进行筛选至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd5b/12410437/f9a5b7bf01ac/gr5.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验