Daraqel Baraa, Wafaie Khaled, Mohammed Hisham, Cao Li, Mheissen Samer, Liu Yang, Zheng Leilei
Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China; Oral Health Research and Promotion Unit, Al-Quds University, Jerusalem, Palestine.
Department of Orthodontics, Faculty of Dentistry, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China.
Am J Orthod Dentofacial Orthop. 2024 Jun;165(6):652-662. doi: 10.1016/j.ajodo.2024.01.012. Epub 2024 Mar 15.
This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer-3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions.
A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded.
The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P <0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation.
Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model.
本研究旨在评估和比较两种人工智能(AI)模型,即聊天生成预训练变换器3.5(ChatGPT - 3.5;OpenAI,加利福尼亚州旧金山)和谷歌双向编码器表征变换器(谷歌巴德;巴德实验,谷歌,加利福尼亚州山景城)在回答一般正畸问题时的回答准确性、完整性、生成时间和回答长度。
一组正畸专家在10个正畸领域编制了一组100个问题。一位作者将这些问题提交给ChatGPT和谷歌巴德。两个模型生成的人工智能回复被随机分为两种形式,并发送给5名不知情的独立评估者。使用一种新开发的信息准确性和完整性工具评估人工智能生成回复的质量。此外,记录回复生成时间和长度。
两种人工智能模型回复的准确性和完整性都很高。ChatGPT的中位准确率得分为9(四分位间距[IQR]:8 - 9),谷歌巴德为8(IQR:8 - 9)(中位差异:1;P <0.001)。两种模型的中位完整性得分相似,ChatGPT为8(IQR:8 - 9),谷歌巴德为8(IQR:7 - 9)。ChatGPT的准确性和完整性几率比谷歌巴德分别高31%和23%。谷歌巴德的回复生成时间比ChatGPT显著短10.4秒/问题。然而,两种模型在回复长度生成方面相似。
ChatGPT和谷歌巴德生成的回复对所提出的一般正畸问题的准确性和完整性评价都很高。然而,使用谷歌巴德模型获取答案通常更快。