人工智能模型在生成正畸常见问题回答方面的表现：ChatGPT与谷歌巴德的对比

The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.

作者信息

Daraqel Baraa, Wafaie Khaled, Mohammed Hisham, Cao Li, Mheissen Samer, Liu Yang, Zheng Leilei

机构信息

Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China; Oral Health Research and Promotion Unit, Al-Quds University, Jerusalem, Palestine.

Department of Orthodontics, Faculty of Dentistry, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China.

出版信息

Am J Orthod Dentofacial Orthop. 2024 Jun;165(6):652-662. doi: 10.1016/j.ajodo.2024.01.012. Epub 2024 Mar 15.

DOI:10.1016/j.ajodo.2024.01.012

PMID:38493370

Abstract

INTRODUCTION

This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer-3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions.

METHODS

A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded.

RESULTS

The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P <0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation.

CONCLUSIONS

Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model.

摘要

引言

本研究旨在评估和比较两种人工智能（AI）模型，即聊天生成预训练变换器3.5（ChatGPT - 3.5；OpenAI，加利福尼亚州旧金山）和谷歌双向编码器表征变换器（谷歌巴德；巴德实验，谷歌，加利福尼亚州山景城）在回答一般正畸问题时的回答准确性、完整性、生成时间和回答长度。

方法

一组正畸专家在10个正畸领域编制了一组100个问题。一位作者将这些问题提交给ChatGPT和谷歌巴德。两个模型生成的人工智能回复被随机分为两种形式，并发送给5名不知情的独立评估者。使用一种新开发的信息准确性和完整性工具评估人工智能生成回复的质量。此外，记录回复生成时间和长度。

结果

两种人工智能模型回复的准确性和完整性都很高。ChatGPT的中位准确率得分为9（四分位间距[IQR]：8 - 9），谷歌巴德为8（IQR：8 - 9）（中位差异：1；P <0.001）。两种模型的中位完整性得分相似，ChatGPT为8（IQR：8 - 9），谷歌巴德为8（IQR：7 - 9）。ChatGPT的准确性和完整性几率比谷歌巴德分别高31%和23%。谷歌巴德的回复生成时间比ChatGPT显著短10.4秒/问题。然而，两种模型在回复长度生成方面相似。

结论

ChatGPT和谷歌巴德生成的回复对所提出的一般正畸问题的准确性和完整性评价都很高。然而，使用谷歌巴德模型获取答案通常更快。

相似文献

The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.人工智能模型在生成正畸常见问题回答方面的表现：ChatGPT与谷歌巴德的对比

Am J Orthod Dentofacial Orthop. 2024 Jun;165(6):652-662. doi: 10.1016/j.ajodo.2024.01.012. Epub 2024 Mar 15.

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性：ChatGPT与谷歌巴德人工智能的比较分析

Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.ChatGPT-4和Bard聊天机器人在回答关于前列腺癌Lu-PSMA-617疗法常见患者问题方面的表现

Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024.

Assessing the Accuracy of AI Models in Orthodontic Knowledge: A Comparative Study Between ChatGPT-4 and Google Bard.评估 AI 模型在正畸知识中的准确性：ChatGPT-4 和 Google Bard 的对比研究。

J Coll Physicians Surg Pak. 2024 Jul;34(7):761-766. doi: 10.29271/jcpsp.2024.07.761.

Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis.聊天生成预训练转换器（ChatGPT）和巴德：人工智能尚未为髋和膝关节骨关节炎提供临床支持的答案。

J Arthroplasty. 2024 May;39(5):1184-1190. doi: 10.1016/j.arth.2024.01.029. Epub 2024 Jan 17.

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力：ChatGPT、谷歌巴德和微软必应的比较研究

Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能：比较混合方法研究。

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

Evaluation of the Current Status of Artificial Intelligence for Endourology Patient Education: A Blind Comparison of ChatGPT and Google Bard Against Traditional Information Resources.评估人工智能在泌尿内镜患者教育中的现状：ChatGPT 和 Google Bard 与传统信息资源的盲对比。

J Endourol. 2024 Aug;38(8):843-851. doi: 10.1089/end.2023.0696. Epub 2024 May 17.

ChatGPT-3.5 Versus Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions?ChatGPT-3.5与谷歌巴德：哪种大语言模型对常见的怀孕问题回答得最好？

Cureus. 2024 Jul 27;16(7):e65543. doi: 10.7759/cureus.65543. eCollection 2024 Jul.

引用本文的文献

Can AI chatbots accurately provide information on orthodontic risks?人工智能聊天机器人能否准确提供有关正畸风险的信息？

Angle Orthod. 2025 Jun 20;95(5):483-489. doi: 10.2319/121424-1021.1. eCollection 2025 Sep.

Effect of AI-based chatbots in promoting oral health awareness among rural populations.基于人工智能的聊天机器人对提高农村人口口腔健康意识的影响。

Bioinformation. 2025 Apr 30;21(4):827-831. doi: 10.6026/973206300210827. eCollection 2025.

Comparison of responses from different artificial intelligence-powered chatbots regarding the All-on-four dental implant concept.不同人工智能驱动的聊天机器人对全口四颗种植牙概念的回答比较。

BMC Oral Health. 2025 Jun 5;25(1):922. doi: 10.1186/s12903-025-06294-7.

The potentials and challenges of integrating generative artificial intelligence (AI) in dental and orthodontic education: a systematic review.将生成式人工智能（AI）整合到牙科和正畸教育中的潜力与挑战：一项系统综述

BMC Oral Health. 2025 Jun 3;25(1):905. doi: 10.1186/s12903-025-06070-7.

Comparative analysis of AI chatbot (ChatGPT-4.0 and Microsoft Copilot) and expert responses to common orthodontic questions: patient and orthodontist evaluations.人工智能聊天机器人（ChatGPT-4.0和Microsoft Copilot）与正畸专家对常见正畸问题回答的比较分析：患者和正畸医生的评估

BMC Oral Health. 2025 Jun 3;25(1):896. doi: 10.1186/s12903-025-06194-w.

Competencies of Large Language Models About Piriformis Syndrome: Quality, Accuracy, Completeness, and Readability Study.大语言模型关于梨状肌综合征的能力：质量、准确性、完整性和可读性研究。

HSS J. 2025 May 20:15563316251340697. doi: 10.1177/15563316251340697.

Can Large Language Models Serve as Reliable Tools for Information in Dentistry? A Systematic Review.大语言模型能否作为牙科领域可靠的信息工具？一项系统综述。

Int Dent J. 2025 May 16;75(4):100835. doi: 10.1016/j.identj.2025.04.015.

Performance of artificial intelligence chatbots in responding to the frequently asked questions of patients regarding dental prostheses.人工智能聊天机器人在回答患者有关假牙常见问题方面的表现。

BMC Oral Health. 2025 Apr 15;25(1):574. doi: 10.1186/s12903-025-05965-9.

Unlocking the Potentials of Large Language Models in Orthodontics: A Scoping Review.解锁大语言模型在正畸学中的潜力：一项范围综述

Bioengineering (Basel). 2024 Nov 13;11(11):1145. doi: 10.3390/bioengineering11111145.

Can artificial intelligence models serve as patient information consultants in orthodontics?人工智能模型能否在正畸学中充当患者信息顾问？

BMC Med Inform Decis Mak. 2024 Jul 29;24(1):211. doi: 10.1186/s12911-024-02619-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能模型在生成正畸常见问题回答方面的表现：ChatGPT与谷歌巴德的对比

The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSIONS

引言

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献