• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在回答牙周根分叉病变管理临床问题中的性能评估

Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.

作者信息

Chatzopoulos Georgios S, Koidou Vasiliki P, Tsalikis Lazaros, Kaklamanos Eleftherios G

机构信息

Department of Preventive Dentistry, Periodontology and Implant Biology, School of Dentistry, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece.

Division of Periodontology, Department of Developmental and Surgical Sciences, School of Dentistry, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

Dent J (Basel). 2025 Jun 18;13(6):271. doi: 10.3390/dj13060271.

DOI:10.3390/dj13060271
PMID:40559174
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12191798/
Abstract

: Large Language Models (LLMs) are artificial intelligence (AI) systems with the capacity to process vast amounts of text and generate human-like language, offering the potential for improved information retrieval in healthcare. This study aimed to assess and compare the evidence-based potential of answers provided by four LLMs to common clinical questions concerning the management and treatment of periodontal furcation defects. : Four LLMs-ChatGPT 4.0, Google Gemini, Google Gemini Advanced, and Microsoft Copilot-were used to answer ten clinical questions related to periodontal furcation defects. The LLM-generated responses were compared against a "gold standard" derived from the European Federation of Periodontology (EFP) S3 guidelines and recent systematic reviews. Two board-certified periodontists independently evaluated the answers for comprehensiveness, scientific accuracy, clarity, and relevance using a predefined rubric and a scoring system of 0-10. : The study found variability in LLM performance across the evaluation criteria. Google Gemini Advanced generally achieved the highest average scores, particularly in comprehensiveness and clarity, while Google Gemini and Microsoft Copilot tended to score lower, especially in relevance. However, the Kruskal-Wallis test revealed no statistically significant differences in the overall average scores among the LLMs. Evaluator agreement and intra-evaluator reliability were high. : While LLMs demonstrate the potential to answer clinical questions related to furcation defect management, their performance varies. LLMs showed different comprehensiveness, scientific accuracy, clarity, and relevance degrees. Dental professionals should be aware of LLMs' capabilities and limitations when seeking clinical information.

摘要

大语言模型(LLMs)是一种人工智能(AI)系统,能够处理大量文本并生成类人文本,为改善医疗保健中的信息检索提供了潜力。本研究旨在评估和比较四种大语言模型针对牙周根分叉病变管理和治疗的常见临床问题所提供答案的循证潜力。

使用了四种大语言模型——ChatGPT 4.0、谷歌Gemini、谷歌Gemini Advanced和微软Copilot——来回答与牙周根分叉病变相关的十个临床问题。将大语言模型生成的回答与源自欧洲牙周病学联合会(EFP)S3指南和近期系统评价的“金标准”进行比较。两名获得董事会认证的牙周病专家使用预定义的评分标准和0至10分的评分系统,独立评估答案的全面性、科学准确性、清晰度和相关性。

研究发现,大语言模型在各项评估标准上的表现存在差异。谷歌Gemini Advanced通常获得最高平均分,尤其是在全面性和清晰度方面,而谷歌Gemini和微软Copilot的得分往往较低,特别是在相关性方面。然而,Kruskal-Wallis检验显示,大语言模型之间的总体平均分没有统计学上的显著差异。评估者之间的一致性和评估者内部的可靠性都很高。

虽然大语言模型展示了回答与根分叉病变管理相关临床问题的潜力,但其表现各不相同。大语言模型在全面性、科学准确性、清晰度和相关性方面呈现出不同程度。牙科专业人员在寻求临床信息时应了解大语言模型的能力和局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3e8/12191798/766ba5baf2fb/dentistry-13-00271-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3e8/12191798/ba82a9870f5a/dentistry-13-00271-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3e8/12191798/766ba5baf2fb/dentistry-13-00271-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3e8/12191798/ba82a9870f5a/dentistry-13-00271-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3e8/12191798/766ba5baf2fb/dentistry-13-00271-g002.jpg

相似文献

1
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.大语言模型在回答牙周根分叉病变管理临床问题中的性能评估
Dent J (Basel). 2025 Jun 18;13(6):271. doi: 10.3390/dj13060271.
2
Large language models in periodontology: Assessing their performance in clinically relevant questions.牙周病学中的大语言模型:评估它们在临床相关问题中的表现。
J Prosthet Dent. 2024 Nov 18. doi: 10.1016/j.prosdent.2024.10.020.
3
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.使用大语言模型对黄蜂蜇伤进行临床管理:横断面评估研究
J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.
4
A Comparative Analysis of the Accuracy and Readability of Popular Artificial Intelligence-Chat Bots for Inguinal Hernia Management.用于腹股沟疝管理的流行人工智能聊天机器人的准确性和可读性比较分析。
Am Surg. 2025 Jun 25:31348251353065. doi: 10.1177/00031348251353065.
5
Current Applications of Chatbots Powered by Large Language Models in Oral and Maxillofacial Surgery: A Systematic Review.基于大语言模型的聊天机器人在口腔颌面外科的当前应用:一项系统综述
Dent J (Basel). 2025 Jun 11;13(6):261. doi: 10.3390/dj13060261.
6
Accuracy and Reliability of Artificial Intelligence Chatbots as Public Information Sources in Implant Dentistry.人工智能聊天机器人作为种植牙科公共信息来源的准确性和可靠性
Int J Oral Maxillofac Implants. 2025 Jun 25;0(0):1-23. doi: 10.11607/jomi.11280.
7
Large Language Model-Assisted Risk-of-Bias Assessment in Randomized Controlled Trials Using the Revised Risk-of-Bias Tool: Usability Study.使用修订后的偏倚风险工具在随机对照试验中进行大语言模型辅助的偏倚风险评估:可用性研究
J Med Internet Res. 2025 Jun 24;27:e70450. doi: 10.2196/70450.
8
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性:横断面研究。
J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.
9
Guided tissue regeneration for periodontal infra-bony defects.牙周骨下袋缺损的引导组织再生术。
Cochrane Database Syst Rev. 2006 Apr 19(2):CD001724. doi: 10.1002/14651858.CD001724.pub2.
10
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

本文引用的文献

1
Large Language Models in peri-implant disease: How well do they perform?用于种植体周围疾病的大语言模型:它们的表现如何?
J Prosthet Dent. 2025 Mar 6. doi: 10.1016/j.prosdent.2025.02.008.
2
Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence.评估六种大语言模型在儿童牙科领域基于证据的潜力:生成式人工智能的比较研究
Eur Arch Paediatr Dent. 2025 Jun;26(3):527-535. doi: 10.1007/s40368-025-01012-x. Epub 2025 Feb 22.
3
Large language models in periodontology: Assessing their performance in clinically relevant questions.
牙周病学中的大语言模型:评估它们在临床相关问题中的表现。
J Prosthet Dent. 2024 Nov 18. doi: 10.1016/j.prosdent.2024.10.020.
4
A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions.ChatGPT 3.5 和 ChatGPT 4 在回答选定遗传学问题方面的比较评估。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2271-2283. doi: 10.1093/jamia/ocae128.
5
How well do large language model-based chatbots perform in oral and maxillofacial radiology?基于大型语言模型的聊天机器人在口腔颌面放射学中的表现如何?
Dentomaxillofac Radiol. 2024 Sep 1;53(6):390-395. doi: 10.1093/dmfr/twae021.
6
Assessment of artificial intelligence applications in responding to dental trauma.评估人工智能在应对牙科创伤中的应用。
Dent Traumatol. 2024 Dec;40(6):722-729. doi: 10.1111/edt.12965. Epub 2024 May 14.
7
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力:ChatGPT、谷歌巴德和微软必应的比较研究
Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.
8
The Quality of AI-Generated Dental Caries Multiple Choice Questions: A Comparative Analysis of ChatGPT and Google Bard Language Models.人工智能生成的龋齿多项选择题的质量:ChatGPT和谷歌巴德语言模型的比较分析
Heliyon. 2024 Mar 19;10(7):e28198. doi: 10.1016/j.heliyon.2024.e28198. eCollection 2024 Apr 15.
9
Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study.聊天机器人与临床医生回答儿科牙科问题的准确性和一致性:一项试点研究。
J Dent. 2024 May;144:104938. doi: 10.1016/j.jdent.2024.104938. Epub 2024 Apr 3.
10
Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis.评价 ChatGPT 生成的医学回复:系统评价和荟萃分析。
J Biomed Inform. 2024 Mar;151:104620. doi: 10.1016/j.jbi.2024.104620. Epub 2024 Mar 8.