Suppr超能文献

人工智能平台的比较分析:ChatGPT-4o与谷歌Gemini在回答避孕方法相关问题方面的表现

A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods.

作者信息

Muluk Erhan

机构信息

Obstetrics and Gynaecology, Anatolia Hospital, Antalya, TUR.

出版信息

Cureus. 2025 Jan 1;17(1):e76745. doi: 10.7759/cureus.76745. eCollection 2025 Jan.

Abstract

Background Birth control methods (BCMs) are often underutilized or misunderstood, especially among young individuals entering their reproductive years. With the growing reliance on artificial intelligence (AI) platforms for health-related information, this study evaluates the performance of ChatGPT-4o and Google Gemini in addressing commonly asked questions about BCMs. Methods Thirty questions, derived from the American College of Obstetrics and Gynecologists (ACOG) website, were posed to both AI platforms. Questions spanned four categories: general contraception, specific contraceptive types, emergency contraception, and other topics. Responses were evaluated using a five-point rubric assessing Relevance, Completeness, and Lack of False Information (RCL). Overall scores were calculated by averaging the rubric scores. Statistical analysis, including the Wilcoxon Signed-Rank test, Friedman test, and Kruskal-Wallis test, was performed to compare metrics. Results ChatGPT-4o and Google Gemini provided high-quality responses to birth control-related queries, with overall scores averaging 4.38 ± 0.58 and 4.37 ± 0.52, respectively, both categorized as "very good" to "excellent." ChatGPT-4o demonstrated higher scores in the lack of false information, based on descriptive statistics (4.70 ± 0.60 vs. 4.47 ± 0.73), while Google Gemini outperformed in relevance, with a statistically significant difference (4.53 ± 0.57 vs. 4.30 ± 0.70, p = 0.035, large effect size). Completeness scores were comparable (p = 0.655). Statistical analyses revealed no significant differences in overall performance (p = 0.548), though Google Gemini demonstrated a potential trend of stronger performance in the "Other Topics" category. Within-model variability showed ChatGPT-4o had more pronounced differences among metrics (moderate effect size, Kendall's W = 0.357), while Google Gemini exhibited smaller variability (Kendall's W = 0.165). These findings suggest that both platforms offer reliable and complementary tools for addressing knowledge gaps in contraception, with nuanced strengths that warrant further exploration. Conclusions ChatGPT-4o and Google Gemini provided reliable and accurate responses to BCM-related queries, with slight differences in strengths. These findings underscore the potential of AI tools, in addressing public health information needs, particularly for young individuals seeking guidance on contraception. Further studies with larger datasets may elucidate nuanced differences between AI platforms.

摘要

背景 避孕方法(BCMs)常常未得到充分利用或被误解,尤其是在进入生育年龄的年轻人中。随着对人工智能(AI)平台获取健康相关信息的依赖日益增加,本研究评估了ChatGPT-4o和谷歌Gemini在回答有关避孕方法常见问题方面的表现。方法 从美国妇产科医师学会(ACOG)网站获取的30个问题被抛给了这两个人工智能平台。问题涵盖四个类别:一般避孕、特定避孕类型、紧急避孕和其他主题。使用五分制评分标准评估回答,该标准评估相关性、完整性和无虚假信息(RCL)。通过对评分标准得分求平均值来计算总分。进行了包括威尔科克森符号秩检验、弗里德曼检验和克鲁斯卡尔 - 沃利斯检验在内的统计分析,以比较各项指标。结果 ChatGPT-4o和谷歌Gemini对与避孕相关的问题给出了高质量回答,总分平均分别为4.38±0.58和4.37±0.52,均被归类为“非常好”到“优秀”。基于描述性统计,ChatGPT-4o在无虚假信息方面得分更高(4.70±0.60对4.47±0.73),而谷歌Gemini在相关性方面表现更优,差异具有统计学意义(4.53±0.57对4.30±0.70,p = 0.035,效应量较大)。完整性得分相当(p = 0.655)。统计分析显示总体表现无显著差异(p = 0.548),不过谷歌Gemini在“其他主题”类别中表现出潜在的更强表现趋势。模型内变异性显示ChatGPT-4o各项指标之间的差异更明显(中等效应量,肯德尔W系数 = 0.357),而谷歌Gemini的变异性较小(肯德尔W系数 = 0.165)。这些发现表明,这两个平台都为填补避孕知识空白提供了可靠且互补的工具,其细微的优势值得进一步探索。结论 ChatGPT-4o和谷歌Gemini对与避孕方法相关的问题给出了可靠且准确的回答,优势略有不同。这些发现强调了人工智能工具在满足公共卫生信息需求方面的潜力,特别是对于寻求避孕指导的年轻人。使用更大数据集进行的进一步研究可能会阐明人工智能平台之间的细微差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/765b/11785371/a35974ad845a/cureus-0017-00000076745-i01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验