• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于种植体周围疾病的大语言模型:它们的表现如何?

Large Language Models in peri-implant disease: How well do they perform?

作者信息

Koidou Vasiliki P, Chatzopoulos Georgios S, Tsalikis Lazaros, Kaklamanos Eleutherios G

机构信息

Research Associate, Centre for Oral Immunobiology and Regenerative Medicine and Centre for Oral Clinical Research, Institute of Dentistry, Queen Mary University of London (QMUL), London, England, UK.

PhD candidate, Department of Preventive Dentistry, Periodontology and Implant Biology, School of Dentistry, Aristotle University of Thessaloniki, Thessaloniki, Greece; and Visiting Research Assistant Professor, Division of Periodontology, Department of Developmental and Surgical Sciences, School of Dentistry, University of Minnesota, Minneapolis, Minn.

出版信息

J Prosthet Dent. 2025 Mar 6. doi: 10.1016/j.prosdent.2025.02.008.

DOI:10.1016/j.prosdent.2025.02.008
PMID:40055086
Abstract

STATEMENT OF PROBLEM

Artificial intelligence (AI) has gained significant recent attention and several AI applications, such as the Large Language Models (LLMs) are promising for use in clinical medicine and dentistry. Nevertheless, assessing the performance of LLMs is essential to identify potential inaccuracies or even prevent harmful outcomes.

PURPOSE

The purpose of this study was to evaluate and compare the evidence-based potential of answers provided by 4 LLMs to clinical questions in the field of implant dentistry.

MATERIAL AND METHODS

A total of 10 open-ended questions pertinent to prevention and treatment of peri-implant disease were posed to 4 distinct LLMs including ChatGPT 4.0, Google Gemini, Google Gemini Advanced, and Microsoft Copilot. The answers were evaluated independently by 2 periodontists against scientific evidence for comprehensiveness, scientific accuracy, clarity, and relevance. The LLMs responses received scores ranging from 0 (minimum) to 10 (maximum) points. To assess the intra-evaluator reliability, a re-evaluation of the LLM responses was performed after 2 weeks and Cronbach α and interclass correlation coefficient (ICC) was used (α=.05).

RESULTS

The scores assigned by the examiners on the 2 occasions were not statistically different and each LLM received an average score. Google Gemini Advanced ranked higher than the rest of the LLMs, while Google Gemini scored worst. The difference between Google Gemini Advanced and Google Gemini was statistically significantly different (P=.005).

CONCLUSIONS

Dental professionals need to be cautious when using LLMs to access content related to peri-implant diseases. LLMs cannot currently replace dental professionals and caution should be exercised when used in patient care.

摘要

问题陈述

人工智能(AI)最近受到了广泛关注,一些人工智能应用,如大语言模型(LLMs)在临床医学和牙科领域具有应用前景。然而,评估大语言模型的性能对于识别潜在的不准确之处甚至预防有害结果至关重要。

目的

本研究的目的是评估和比较4种大语言模型对种植体周围疾病领域临床问题提供的基于证据的回答的潜力。

材料与方法

向包括ChatGPT 4.0、谷歌Gemini、谷歌Gemini Advanced和微软Copilot在内的4种不同的大语言模型提出了总共10个与种植体周围疾病预防和治疗相关的开放式问题。两名牙周病专家根据科学证据对回答进行独立评估,评估内容包括全面性、科学准确性、清晰度和相关性。大语言模型的回答得分范围为0(最低)至10(最高)分。为了评估评估者内部的可靠性,在2周后对大语言模型的回答进行了重新评估,并使用了Cronbach α和组内相关系数(ICC)(α = 0.05)。

结果

两次评估中检查人员给出的分数没有统计学差异,每个大语言模型都获得了一个平均分数。谷歌Gemini Advanced的排名高于其他大语言模型,而谷歌Gemini得分最差。谷歌Gemini Advanced和谷歌Gemini之间的差异具有统计学显著性(P = 0.005)。

结论

牙科专业人员在使用大语言模型获取与种植体周围疾病相关的内容时需要谨慎。大语言模型目前无法取代牙科专业人员,在用于患者护理时应谨慎行事。

相似文献

1
Large Language Models in peri-implant disease: How well do they perform?用于种植体周围疾病的大语言模型:它们的表现如何?
J Prosthet Dent. 2025 Mar 6. doi: 10.1016/j.prosdent.2025.02.008.
2
Large language models in periodontology: Assessing their performance in clinically relevant questions.牙周病学中的大语言模型:评估它们在临床相关问题中的表现。
J Prosthet Dent. 2024 Nov 18. doi: 10.1016/j.prosdent.2024.10.020.
3
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
4
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力:ChatGPT、谷歌巴德和微软必应的比较研究
Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.
5
Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence.评估六种大语言模型在儿童牙科领域基于证据的潜力:生成式人工智能的比较研究
Eur Arch Paediatr Dent. 2025 Jun;26(3):527-535. doi: 10.1007/s40368-025-01012-x. Epub 2025 Feb 22.
6
Comparison of ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and Ophthalmologists in the management of uveitis and ocular inflammation: A comparative study of large language models.ChatGPT-4o、谷歌Gemini 1.5 Pro、微软Copilot Pro与眼科医生在葡萄膜炎和眼部炎症管理中的比较:大型语言模型的对比研究
J Fr Ophtalmol. 2025 Apr;48(4):104468. doi: 10.1016/j.jfo.2025.104468. Epub 2025 Mar 13.
7
Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.ChatGPT-3.5、ChatGPT-4和谷歌Gemini大型语言模型在回答牙种植学相关问题方面的表现。
J Prosthet Dent. 2025 Jan 4. doi: 10.1016/j.prosdent.2024.12.016.
8
Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?人工智能语言模型能否有效解决牙齿创伤问题?
Dent Traumatol. 2025 Apr 1. doi: 10.1111/edt.13063.
9
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
10
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.

引用本文的文献

1
Clinical Applications of Artificial Intelligence in Periodontology: A Scoping Review.人工智能在牙周病学中的临床应用:一项范围综述
Medicina (Kaunas). 2025 Jun 10;61(6):1066. doi: 10.3390/medicina61061066.
2
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.大语言模型在回答牙周根分叉病变管理临床问题中的性能评估
Dent J (Basel). 2025 Jun 18;13(6):271. doi: 10.3390/dj13060271.