Suppr超能文献

大语言模型在喉科学中生成研究方法的评估:ChatGPT-4.0与Gemini 1.5闪速版的比较分析

Evaluation of research methodology generation by large language models in laryngology: a comparative analysis of ChatGPT-4.0 and Gemini 1.5 flash.

作者信息

Türe Nurullah, Umurhan Elif, Tahir Emel

机构信息

Department of Otorhinolaryngology, Kütahya Health Sciences University, Kütahya, Türkiye.

Department of Otorhinolaryngology, Ondokuz Mayıs University, Samsun, Türkiye.

出版信息

Eur Arch Otorhinolaryngol. 2025 Sep 18. doi: 10.1007/s00405-025-09656-7.

Abstract

OBJECTIVES

This study aimed to compare the ability of two major language models, ChatGPT-4.0 and Gemini 1.5 Flash, to establish a research methodology based on scientific publications in laryngology.

METHODS

We screened 80 articles selected from five prestigious otolaryngology journals and included 60 articles with a methods section and statistical analysis. These were classified according to six research types: cell culture, animal experiments, prospective, retrospective, systematic review, and artificial intelligence. A total of 30 studies were analyzed, with five articles randomly selected from each group. For each article, both language models were asked to produce research methodologies, and the responses were evaluated by two independent raters.

RESULTS

There was no statistically significant difference between the mean scores of the models (p > 0.05). ChatGPT 4.0 had a higher mean score (5.17 ± 1.12), especially in the data collection and measurement-assessment category. The Gemini model showed relatively more balanced performance in the statistical analysis category. The weighted kappa values were between 0.54 and 0.71, indicating a moderate to high agreement between the raters. In the analysis by article type, Gemini's performance in Q1 showed significant variation (p = 0.038).

CONCLUSION

Large language models such as ChatGPT and Gemini provide similarly consistent results in establishing the methodology of scientific studies in laryngology. Both models can be considered supportive tools; however, expert supervision is needed, especially for complex constructs such as statistical analysis. This study makes original contributions to the usability of LLMs for study design in laryngology.

摘要

目的

本研究旨在比较两种主要语言模型ChatGPT - 4.0和Gemini 1.5 Flash基于喉科学科学出版物建立研究方法的能力。

方法

我们从五本著名的耳鼻咽喉科期刊中筛选了80篇文章,纳入60篇有方法部分和统计分析的文章。这些文章根据六种研究类型进行分类:细胞培养、动物实验、前瞻性、回顾性、系统评价和人工智能。总共分析了30项研究,每组随机选取5篇文章。对于每篇文章,要求两个语言模型生成研究方法,并由两名独立评分者对回答进行评估。

结果

模型的平均得分之间无统计学显著差异(p > 0.05)。ChatGPT 4.0的平均得分较高(5.17 ± 1.12),尤其是在数据收集和测量评估类别中。Gemini模型在统计分析类别中表现出相对更平衡的性能。加权kappa值在0.54至0.71之间,表明评分者之间存在中度至高度一致性。在按文章类型进行的分析中,Gemini在Q1中的表现存在显著差异(p = 0.038)。

结论

ChatGPT和Gemini等大型语言模型在建立喉科学科学研究方法方面提供了类似一致的结果。两种模型都可被视为支持工具;然而,需要专家监督,特别是对于统计分析等复杂结构。本研究为大型语言模型在喉科学研究设计中的可用性做出了原创性贡献。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验