• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

中文语境下三种大语言模型对堕胎后护理相关询问的回应表现评估:一项对比分析

Evaluation of Three Large Language Models' Response Performances to Inquiries Regarding Post-Abortion Care in the Context of Chinese Language: A Comparative Analysis.

作者信息

Xue Danyue, Liao Sha

机构信息

Department of Operating Room Nursing, West China second University Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.

Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, Sichuan, People's Republic of China.

出版信息

Risk Manag Healthc Policy. 2025 Aug 18;18:2731-2741. doi: 10.2147/RMHP.S531777. eCollection 2025.

DOI:10.2147/RMHP.S531777
PMID:40862290
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12372831/
Abstract

BACKGROUND

This study aimed to evaluate the response performances of three large language models (LLMs) (ChatGPT, Kimi, and Ernie Bot) to inquiries regarding post-abortion care (PAC) in the context of the Chinese language.

METHODS

The data was collected in October 2024. Twenty questions concerning the necessity of contraception after induced abortion, the best time for contraception, choice of a contraceptive method, contraceptive effectiveness, and the potential impact of contraception on fertility were used in this study. Each question was asked three times in Chinese for each LLM. Three PAC consultants conducted the evaluations. A Likert scale was used to score the responses based on accuracy, relevance, completeness, clarity, and reliability.

RESULTS

The number of responses received "good" (a mean score > 4), "average" (3 < mean score ≤ 4), and "poor" (a mean score ≤ 3) in overall evaluation was 159 (88.30%), 19 (10.57%), and 2 (1.10%). No statistically significant differences were identified in the overall evaluation among the three LLMs ( = 0.352). The number of the responses evaluated as good for accuracy, relevance, completeness, clarity, and reliability were 87 (48.33%), 154 (85.53%), 136 (75.57%), 133 (73.87%), and 128 (71.10%), respectively. No statistically significant differences were identified in accuracy, relevance, completeness or clarity between the three LLMs. A statistically significant difference was identified in reliability ( < 0.001).

CONCLUSION

The three LLMs performed well overall and showed great potential for application in PAC consultations. The accuracy of the LLMs' responses should be improved through continuous training and evaluation.

摘要

背景

本研究旨在评估三种大语言模型(ChatGPT、豆包和文心一言)在中文语境下对人工流产后护理(PAC)相关询问的回答表现。

方法

数据于2024年10月收集。本研究使用了20个关于人工流产后避孕的必要性、最佳避孕时间、避孕方法的选择、避孕效果以及避孕对生育的潜在影响的问题。每个问题针对每个大语言模型用中文询问三次。由三名PAC顾问进行评估。采用李克特量表根据准确性、相关性、完整性、清晰度和可靠性对回答进行评分。

结果

总体评估中获得“好”(平均得分>4)、“一般”(3<平均得分≤4)和“差”(平均得分≤3)的回答数量分别为159(88.30%)、19(10.57%)和2(1.10%)。三个大语言模型在总体评估中未发现统计学上的显著差异(P = 0.352)。在准确性、相关性、完整性、清晰度和可靠性方面被评估为好的回答数量分别为87(48.33%)、154(85.53%)、136(75.57%)、133(73.87%)和128(71.10%)。三个大语言模型在准确性、相关性、完整性或清晰度方面未发现统计学上的显著差异。在可靠性方面发现了统计学上的显著差异(P<0.001)。

结论

这三个大语言模型总体表现良好,在PAC咨询中显示出巨大的应用潜力。应通过持续训练和评估提高大语言模型回答的准确性。

相似文献

1
Evaluation of Three Large Language Models' Response Performances to Inquiries Regarding Post-Abortion Care in the Context of Chinese Language: A Comparative Analysis.中文语境下三种大语言模型对堕胎后护理相关询问的回应表现评估:一项对比分析
Risk Manag Healthc Policy. 2025 Aug 18;18:2731-2741. doi: 10.2147/RMHP.S531777. eCollection 2025.
2
Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.大语言模型在中风康复健康教育中的应用:两阶段研究
J Med Internet Res. 2025 Jul 22;27:e73226. doi: 10.2196/73226.
3
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.使用大语言模型对黄蜂蜇伤进行临床管理:横断面评估研究
J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.
4
[Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation].五种大语言模型在口腔辅助诊断、治疗及健康咨询领域的应用初探
Zhonghua Kou Qiang Yi Xue Za Zhi. 2025 Jul 30;60(8):871-878. doi: 10.3760/cma.j.cn112144-20241107-00418.
5
Comparative Analysis of LLMs' Performance On a Practice Radiography Certification Exam.大语言模型在放射实践认证考试中的性能比较分析
Radiol Technol. 2025 May-Jun;96(5):334-342.
6
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
7
Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.从临床医生和患者角度评估ChatGPT在肺癌放疗中的教育潜力:内容质量与可读性分析
JMIR Cancer. 2025 Aug 13;11:e69783. doi: 10.2196/69783.
8
Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.用于神经外科手术的基于大语言模型的聊天机器人的开发与验证:关于加强围手术期患者教育的混合方法研究
J Med Internet Res. 2025 Jul 15;27:e74299. doi: 10.2196/74299.
9
Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.利用人工智能在减重手术中的应用:ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.
10
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估:ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较
BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.

本文引用的文献

1
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
2
A systematic review of large language model (LLM) evaluations in clinical medicine.对临床医学中大型语言模型(LLM)评估的系统综述。
BMC Med Inform Decis Mak. 2025 Mar 7;25(1):117. doi: 10.1186/s12911-025-02954-4.
3
A large language model in solving primary healthcare issues: A potential implication for remote healthcare and medical education.大型语言模型在解决基层医疗保健问题中的应用:对远程医疗和医学教育的潜在影响。
J Educ Health Promot. 2024 Sep 28;13:362. doi: 10.4103/jehp.jehp_688_23. eCollection 2024.
4
Comparative analysis of automatic gender detection from names: evaluating the stability and performance of ChatGPT Namsor, and Gender-API.从名字进行自动性别检测的比较分析:评估ChatGPT、Namsor和Gender-API的稳定性和性能。
PeerJ Comput Sci. 2024 Oct 17;10:e2378. doi: 10.7717/peerj-cs.2378. eCollection 2024.
5
Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.大语言模型在白内障护理信息提供方面的评估:定量比较
Ophthalmol Ther. 2025 Jan;14(1):103-116. doi: 10.1007/s40123-024-01066-y. Epub 2024 Nov 8.
6
The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis.人工智能解决方案在医疗检查和证书中的准确性和能力:系统评价和荟萃分析。
J Med Internet Res. 2024 Nov 5;26:e56532. doi: 10.2196/56532.
7
Large Language Models in Medicine: The Potentials and Pitfalls : A Narrative Review.医学领域的大型语言模型:潜力与陷阱:一篇叙事性综述。
Ann Intern Med. 2024 Feb;177(2):210-220. doi: 10.7326/M23-2772. Epub 2024 Jan 30.
8
The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations.使用大型语言模型获取临床信息的前景与风险:ChatGPT 在作为生育咨询工具方面表现强劲,但存在一定局限性。
Fertil Steril. 2023 Sep;120(3 Pt 2):575-583. doi: 10.1016/j.fertnstert.2023.05.151. Epub 2023 May 20.
9
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.ChatGPT在医学教育、研究与实践中的应用:对其前景与合理担忧的系统评价
Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.
10
Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.ChatGPT中的人工幻觉:对科学写作的影响
Cureus. 2023 Feb 19;15(2):e35179. doi: 10.7759/cureus.35179. eCollection 2023 Feb.