• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估用于提供性健康信息的人工智能(AI)聊天机器人:一项使用真实临床问题的共识研究。

Evaluation of artificial intelligence (AI) chatbots for providing sexual health information: a consensus study using real-world clinical queries.

作者信息

Latt Phyu M, Aung Ei T, Htaik Kay, Soe Nyi N, Lee David, King Alicia J, Fortune Ria, Ong Jason J, Chow Eric P F, Bradshaw Catriona S, Rahman Rashidur, Deneen Matthew, Dobinson Sheranne, Randall Claire, Zhang Lei, Fairley Christopher K

机构信息

Melbourne Sexual Health Centre, Alfred Health, Melbourne, Australia.

School of Translational Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia.

出版信息

BMC Public Health. 2025 May 15;25(1):1788. doi: 10.1186/s12889-025-22933-8.

DOI:10.1186/s12889-025-22933-8
PMID:40375254
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12080182/
Abstract

INTRODUCTION

Artificial Intelligence (AI) chatbots could potentially provide information on sensitive topics, including sexual health, to the public. However, their performance compared to nurses and across different AI chatbots, particularly in the field of sexual health, remains understudied. This study evaluated the performance of three AI chatbots - two prompt-tuned (Alice and Azure) and one standard chatbot (ChatGPT by OpenAI) - in providing sexual health information on questions that experienced sexual health nurses could correctly answer.

METHODS

We analysed 195 anonymised sexual health questions received by the Melbourne Sexual Health Centre phone line. A panel of experts in a blinded order using a consensus-based approach evaluated responses to these questions from nurses and the three AI chatbots. Performance was assessed based on overall correctness and five specific measures: guidance, accuracy, safety, ease of access, and provision of necessary information. We conducted subgroup analyses for clinic-specific (e.g., opening hours) and general sexual health questions and a sensitivity analysis excluding questions that Azure could not answer.

RESULTS

Alice demonstrated the highest overall correctness (85.2%; 95% confidence interval (CI), 82.1-88.0%), followed by Azure (69.3%; 95% CI, 65.3-73.0%) and ChatGPT (64.8%; 95% CI, 60.7-68.7%). Prompt-tuned chatbots outperformed the base ChatGPT across all measures. Among all outcome measures, all chatbots performed best on safety, with Azure achieving the highest safety score (97.9%; 95% CI, 96.4-98.9%), indicating the lowest risk of providing potentially harmful advice. In subgroup analysis, all chatbots performed better on general sexual health questions compared to clinic-specific queries. Sensitivity analysis showed a narrower performance gap between Alice and Azure when excluding questions Azure could not answer.

CONCLUSIONS

Prompt-tuned AI chatbots demonstrated superior performance in providing sexual health information compared to base ChatGPT, with high safety scores particularly noteworthy. However, all AI chatbots showed susceptibility to generating incorrect information. These findings suggest the potential for AI chatbots as adjuncts to human healthcare providers for providing sexual health information while highlighting the need for continued refinement and human oversight. Future research should focus on larger-scale evaluations and real-world implementations.

摘要

引言

人工智能(AI)聊天机器人有可能向公众提供包括性健康在内的敏感话题的信息。然而,与护士相比以及不同AI聊天机器人之间的性能,尤其是在性健康领域,仍未得到充分研究。本研究评估了三个AI聊天机器人——两个经过提示调整的(爱丽丝和Azure)和一个标准聊天机器人(OpenAI的ChatGPT)——在回答有性健康经验的护士能够正确回答的性健康问题时的表现。

方法

我们分析了墨尔本性健康中心热线收到的195个匿名性健康问题。一个专家小组以盲法顺序采用基于共识的方法评估了护士和这三个AI聊天机器人对这些问题的回答。根据总体正确性和五个具体指标评估性能:指导、准确性、安全性、获取便利性和提供必要信息。我们对特定诊所(如营业时间)和一般性健康问题进行了亚组分析,并进行了敏感性分析,排除了Azure无法回答的问题。

结果

爱丽丝表现出最高的总体正确性(85.2%;95%置信区间(CI),82.1 - 88.0%),其次是Azure(69.3%;95% CI,65.3 - 73.0%)和ChatGPT(64.8%;95% CI,60.7 - 68.7%)。经过提示调整的聊天机器人在所有指标上的表现均优于基础ChatGPT。在所有结果指标中,所有聊天机器人在安全性方面表现最佳,Azure获得了最高的安全分数(97.9%;95% CI,96.4 - 98.9%),表明提供潜在有害建议的风险最低。在亚组分析中,与特定诊所的问题相比,所有聊天机器人在一般性健康问题上的表现更好。敏感性分析表明,排除Azure无法回答的问题后,爱丽丝和Azure之间的性能差距缩小。

结论

与基础ChatGPT相比,经过提示调整的AI聊天机器人在提供性健康信息方面表现出卓越性能,高安全分数尤其值得注意。然而,所有AI聊天机器人都容易产生错误信息。这些发现表明AI聊天机器人有潜力作为人类医疗保健提供者的辅助工具来提供性健康信息,同时突出了持续改进和人工监督的必要性。未来的研究应侧重于大规模评估和实际应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3864/12080182/8aef99f63b27/12889_2025_22933_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3864/12080182/d1adac22632c/12889_2025_22933_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3864/12080182/8aef99f63b27/12889_2025_22933_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3864/12080182/d1adac22632c/12889_2025_22933_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3864/12080182/8aef99f63b27/12889_2025_22933_Fig2_HTML.jpg

相似文献

1
Evaluation of artificial intelligence (AI) chatbots for providing sexual health information: a consensus study using real-world clinical queries.评估用于提供性健康信息的人工智能(AI)聊天机器人:一项使用真实临床问题的共识研究。
BMC Public Health. 2025 May 15;25(1):1788. doi: 10.1186/s12889-025-22933-8.
2
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
3
Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study.前瞻性评估 4 种大型语言模型聊天机器人对患者关于急救护理问题的回答的准确性:实验性对比研究。
J Med Internet Res. 2024 Nov 4;26:e60291. doi: 10.2196/60291.
4
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。
J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.
5
Comparative assessment of artificial intelligence chatbots' performance in responding to healthcare professionals' and caregivers' questions about Dravet syndrome.人工智能聊天机器人在回答医疗专业人员和护理人员有关德雷维特综合征问题时的性能比较评估。
Epilepsia Open. 2025 Apr 1. doi: 10.1002/epi4.70022.
6
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性:一项观察性横断面研究。
Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.
7
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性:一项横断面研究。
BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.
8
Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.人工智能聊天机器人在回应与创伤性牙损伤相关的患者咨询中的表现:一项比较研究。
Dent Traumatol. 2025 Jun;41(3):338-347. doi: 10.1111/edt.13020. Epub 2024 Nov 22.
9
Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.评估人工智能聊天机器人对癌症热门搜索查询的响应
JAMA Oncol. 2023 Oct 1;9(10):1437-1440. doi: 10.1001/jamaoncol.2023.2947.
10
AI Chatbots as Sources of STD Information: A Study on Reliability and Readability.作为性传播疾病信息来源的人工智能聊天机器人:可靠性与可读性研究
J Med Syst. 2025 Apr 3;49(1):43. doi: 10.1007/s10916-025-02178-z.

本文引用的文献

1
User experiences of an AI application for predicting risk of sexually transmitted infections.人工智能应用预测性传播感染风险的用户体验。
Digit Health. 2024 Oct 18;10:20552076241289646. doi: 10.1177/20552076241289646. eCollection 2024 Jan-Dec.
2
ChatGPT as a tool to improve access to knowledge on sexually transmitted infections.ChatGPT 作为一种提高性传播感染知识获取途径的工具。
Sex Transm Infect. 2024 Nov 18;100(8):528-531. doi: 10.1136/sextrans-2024-056217.
3
Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients' questions about heart failure.
基于在线大语言模型的人工智能聊天平台在回答患者关于心力衰竭问题时的准确性和一致性。
Int J Cardiol. 2024 Aug 1;408:132115. doi: 10.1016/j.ijcard.2024.132115. Epub 2024 Apr 30.
4
Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI.用于评估由生成式人工智能驱动的医疗对话有效性的基础指标。
NPJ Digit Med. 2024 Mar 29;7(1):82. doi: 10.1038/s41746-024-01074-z.
5
Your robot therapist is not your therapist: understanding the role of AI-powered mental health chatbots.你的机器人治疗师并非你的治疗师:理解人工智能驱动的心理健康聊天机器人的作用。
Front Digit Health. 2023 Nov 8;5:1278186. doi: 10.3389/fdgth.2023.1278186. eCollection 2023.
6
Large Language Model (LLM)-Powered Chatbots Fail to Generate Guideline-Consistent Content on Resuscitation and May Provide Potentially Harmful Advice.大型语言模型 (LLM) 驱动的聊天机器人无法生成与复苏指南一致的内容,并且可能提供潜在有害的建议。
Prehosp Disaster Med. 2023 Dec;38(6):757-763. doi: 10.1017/S1049023X23006568. Epub 2023 Nov 6.
7
Revolutionizing healthcare: the role of artificial intelligence in clinical practice.人工智能在临床实践中的应用:医疗保健的革命。
BMC Med Educ. 2023 Sep 22;23(1):689. doi: 10.1186/s12909-023-04698-z.
8
Chatbots to Improve Sexual and Reproductive Health: Realist Synthesis.聊天机器人改善性健康和生殖健康:现实主义综合研究。
J Med Internet Res. 2023 Aug 9;25:e46761. doi: 10.2196/46761.
9
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
10
Response to: Impact of ChatGPT and Artificial Intelligence in the Contemporary Medical Landscape.回应:ChatGPT和人工智能在当代医学领域的影响
Arch Med Res. 2023 Jul;54(5):102838. doi: 10.1016/j.arcmed.2023.06.003. Epub 2023 Jun 24.