Suppr超能文献

ChatGPT与谷歌Gemini在回答正颌外科相关问题中的应用:一项比较研究。

The use of ChatGPT and Google Gemini in responding to orthognathic surgery-related questions: A comparative study.

作者信息

Aziz Ahmed A Abdel, Abdelrahman Hams H, Hassan Mohamed G

机构信息

Department of Orthodontics, Faculty of Dentistry, Assiut University, Assiut, Egypt.

Department of Pediatric Dentistry and Dental Public Health, Faculty of Dentistry, Alexandria University, Alexandria, Egypt.

出版信息

J World Fed Orthod. 2025 Feb;14(1):20-26. doi: 10.1016/j.ejwf.2024.09.004. Epub 2024 Oct 28.

Abstract

AIM

This study employed a quantitative approach to compare the reliability of responses provided by ChatGPT-3.5, ChatGPT-4, and Google Gemini in response to orthognathic surgery-related questions.

MATERIAL AND METHODS

The authors adapted a set of 64 questions encompassing all of the domains and aspects related to orthognathic surgery. One author submitted the questions to ChatGPT3.5, ChatGPT4, and Google Gemini. The AI-generated responses from the three platforms were recorded and evaluated by 2 blinded and independent experts. The reliability of AI-generated responses was evaluated using a tool for accuracy of information and completeness. In addition, the provision of definitive answers to close-ended questions, references, graphical elements, and advice to schedule consultations with a specialist were collected.

RESULTS

Although ChatGPT-3.5 achieved the highest information reliability score, the 3 LLMs showed similar reliability scores in providing responses to orthognathic surgery-related inquiries. Moreover, Google Gemini significantly included physician recommendations and provided graphical elements. Both ChatGPT-3.5 and -4 lacked these features.

CONCLUSION

This study shows that ChatGPT-3.5, ChatGPT-4, and Google Gemini can provide reliable responses to inquires about orthognathic surgery. However, Google Gemini stood out by incorporating additional references and illustrations within its responses. These findings highlight the need for an additional evaluation of AI capabilities across different healthcare domains.

摘要

目的

本研究采用定量方法比较ChatGPT-3.5、ChatGPT-4和谷歌Gemini在回答正颌外科相关问题时提供的回答的可靠性。

材料与方法

作者改编了一组64个问题,涵盖与正颌外科相关的所有领域和方面。一位作者将这些问题提交给ChatGPT3.5、ChatGPT4和谷歌Gemini。来自这三个平台的人工智能生成的回答由两名不知情的独立专家进行记录和评估。使用信息准确性和完整性工具评估人工智能生成的回答的可靠性。此外,还收集了对封闭式问题的明确答案、参考文献、图形元素以及安排与专家会诊的建议。

结果

虽然ChatGPT-3.5获得了最高的信息可靠性分数,但这三个大语言模型在回答正颌外科相关询问时显示出相似的可靠性分数。此外,谷歌Gemini显著包含了医生建议并提供了图形元素。ChatGPT-3.5和-4都缺乏这些功能。

结论

本研究表明,ChatGPT-3.5、ChatGPT-4和谷歌Gemini能够对正颌外科相关询问提供可靠的回答。然而,谷歌Gemini在其回答中纳入额外的参考文献和插图方面表现突出。这些发现凸显了对不同医疗领域人工智能能力进行额外评估的必要性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验