• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型在牙科手术中预防感染性心内膜炎的准确性。

Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures.

作者信息

Rewthamrongsris Paak, Burapacheep Jirayu, Trachoo Vorapat, Porntaveetus Thantrira

机构信息

Department of Anatomy, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.

Stanford University, Stanford, California, USA.

出版信息

Int Dent J. 2025 Feb;75(1):206-212. doi: 10.1016/j.identj.2024.09.033. Epub 2024 Oct 12.

DOI:10.1016/j.identj.2024.09.033
PMID:39395898
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11806337/
Abstract

PURPOSE

Infective endocarditis (IE) is a serious, life-threatening condition requiring antibiotic prophylaxis for high-risk individuals undergoing invasive dental procedures. As LLMs are rapidly adopted by dental professionals for their efficiency and accessibility, assessing their accuracy in answering critical questions about antibiotic prophylaxis for IE prevention is crucial.

METHODS

Twenty-eight true/false questions based on the 2021 American Heart Association (AHA) guidelines for IE were posed to 7 popular LLMs. Each model underwent five independent runs per question using two prompt strategies: a pre-prompt as an experienced dentist and without a pre-prompt. Inter-model comparisons utilised the Kruskal-Wallis test, followed by post-hoc pairwise comparisons using Prism 10 software.

RESULTS

Significant differences in accuracy were observed among the LLMs. All LLMs had a narrower confidence interval with a pre-prompt, and most, except Claude 3 Opus, showed improved performance. GPT-4o had the highest accuracy (80% with a pre-prompt, 78.57% without), followed by Gemini 1.5 Pro (78.57% and 77.86%) and Claude 3 Opus (75.71% and 77.14%). Gemini 1.5 Flash had the lowest accuracy (68.57% and 63.57%). Without a pre-prompt, Gemini 1.5 Flash's accuracy was significantly lower than Claude 3 Opus, Gemini 1.5 Pro, and GPT-4o. With a pre-prompt, Gemini 1.5 Flash and Claude 3.5 were significantly less accurate than Gemini 1.5 Pro and GPT-4o. None of the LLMs met the commonly used benchmark scores. All models provided both correct and incorrect answers randomly, except Claude 3.5 Sonnet with a pre-prompt, which consistently gave incorrect answers to eight questions across five runs.

CONCLUSION

LLMs like GPT-4o show promise for retrieving AHA-IE guideline information, achieving up to 80% accuracy. However, complex medical questions may still pose a challenge. Pre-prompts offer a potential solution, and domain-specific training is essential for optimizing LLM performance in healthcare, especially with the emergence of models with increased token limits.

摘要

目的

感染性心内膜炎(IE)是一种严重的、危及生命的疾病,需要对接受侵入性牙科手术的高危个体进行抗生素预防。由于大型语言模型(LLMs)因其效率和易用性而被牙科专业人员迅速采用,评估它们在回答有关预防IE的抗生素预防关键问题时的准确性至关重要。

方法

根据2021年美国心脏协会(AHA)的IE指南提出了28道是非题,向7个流行的大型语言模型提问。每个模型针对每个问题使用两种提示策略进行五次独立运行:一种是作为经验丰富的牙医的预提示,另一种是无预提示。模型间比较采用Kruskal-Wallis检验,随后使用Prism 10软件进行事后成对比较。

结果

在大型语言模型之间观察到准确性存在显著差异。所有大型语言模型在有预提示时的置信区间都更窄,除了Claude 3 Opus外,大多数模型的性能都有所提高。GPT-4o的准确性最高(有预提示时为80%,无预提示时为78.57%),其次是Gemini 1.5 Pro(78.57%和77.86%)和Claude 3 Opus(75.71%和77.14%)。Gemini 1.5 Flash的准确性最低(68.57%和63.57%)。无预提示时,Gemini 1.5 Flash的准确性显著低于Claude 3 Opus、Gemini 1.5 Pro和GPT-4o。有预提示时,Gemini 1.5 Flash和Claude 3.5的准确性显著低于Gemini 1.5 Pro和GPT-4o。没有一个大型语言模型达到常用的基准分数。所有模型都随机给出了正确和错误的答案,除了有预提示的Claude 3.5 Sonnet,它在五次运行中对八个问题一直给出错误答案。

结论

像GPT-4o这样的大型语言模型在检索AHA-IE指南信息方面显示出前景,准确率高达80%。然而,复杂的医学问题可能仍然构成挑战。预提示提供了一种潜在的解决方案,特定领域的训练对于优化大型语言模型在医疗保健中的性能至关重要,特别是随着具有增加令牌限制的模型的出现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b94/11806337/b81c4baffdeb/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b94/11806337/da98ef442388/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b94/11806337/b81c4baffdeb/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b94/11806337/da98ef442388/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b94/11806337/b81c4baffdeb/gr2.jpg

相似文献

1
Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures.大型语言模型在牙科手术中预防感染性心内膜炎的准确性。
Int Dent J. 2025 Feb;75(1):206-212. doi: 10.1016/j.identj.2024.09.033. Epub 2024 Oct 12.
2
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。
Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.
3
Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.GPT-4o、Claude 3 Opus 和 Gemini 1.5 Pro 在“诊断请”案例中的诊断性能。
Jpn J Radiol. 2024 Nov;42(11):1231-1235. doi: 10.1007/s11604-024-01619-y. Epub 2024 Jul 1.
4
Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam.评估牙科麻醉学中的大语言模型:ChatGPT-4、Claude 3 Opus和Gemini 1.0在日本麻醉学牙科协会委员会认证考试中的比较分析。
Cureus. 2024 Sep 27;16(9):e70302. doi: 10.7759/cureus.70302. eCollection 2024 Sep.
5
Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.评估大型语言模型在放射学中的参考准确性:一项跨亚专业的比较研究。
Diagn Interv Radiol. 2025 May 12. doi: 10.4274/dir.2025.253101.
6
Assessing large language models as assistive tools in medical consultations for Kawasaki disease.评估大型语言模型作为川崎病医疗咨询辅助工具的作用。
Front Artif Intell. 2025 Mar 31;8:1571503. doi: 10.3389/frai.2025.1571503. eCollection 2025.
7
Performance of Multimodal Large Language Models in Japanese Diagnostic Radiology Board Examinations (2021-2023).多模态大语言模型在日本诊断放射学委员会考试(2021 - 2023年)中的表现
Acad Radiol. 2025 May;32(5):2394-2401. doi: 10.1016/j.acra.2024.10.035. Epub 2024 Nov 8.
8
Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions.多模态大语言模型在放射学问答病例中的诊断性能:提示工程和输入条件的影响
Ultrasonography. 2025 May;44(3):220-231. doi: 10.14366/usg.25012. Epub 2025 Mar 11.
9
Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.ChatGPT-4o和谷歌Gemini在基于图像的神经外科委员会问题上的表现准确性和质量。
Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.
10
Comparative analysis of LLMs performance in medical embryology: A cross-platform study of ChatGPT, Claude, Gemini, and Copilot.大语言模型在医学胚胎学中的性能比较分析:ChatGPT、Claude、Gemini和Copilot的跨平台研究
Anat Sci Educ. 2025 May 11. doi: 10.1002/ase.70044.

引用本文的文献

1
Automatic extraction of SmPC document for IDMP data model construction using foundation LLM and RAG: a preliminary experiment for pharmaceutical regulatory affairs.使用基础语言模型和检索增强生成(RAG)自动提取用于国际药品标识数据库(IDMP)数据模型构建的药品说明书文档:药物监管事务的初步实验
Front Med (Lausanne). 2025 Aug 13;12:1598979. doi: 10.3389/fmed.2025.1598979. eCollection 2025.
2
Clarifying the Findings of Our Study on Large Language Models and Infective Endocarditis Prophylaxis.阐明我们关于大语言模型与感染性心内膜炎预防研究的结果。
Int Dent J. 2025 Jul 17;75(5):100887. doi: 10.1016/j.identj.2025.100887.
3
Mapping Review of the Correlations Between Periodontitis, Dental Caries, and Endocarditis.

本文引用的文献

1
Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation.用于评估医学领域大语言模型回复的数据集和基准(MedGPTEval):评估开发与验证
JMIR Med Inform. 2024 Jun 28;12:e57674. doi: 10.2196/57674.
2
Assessment of artificial intelligence applications in responding to dental trauma.评估人工智能在应对牙科创伤中的应用。
Dent Traumatol. 2024 Dec;40(6):722-729. doi: 10.1111/edt.12965. Epub 2024 May 14.
3
Antibiotic Prophylaxis and Infective Endocarditis Incidence Following Invasive Dental Procedures: A Systematic Review and Meta-Analysis.
牙周炎、龋齿与心内膜炎之间相关性的图谱综述
Dent J (Basel). 2025 May 16;13(5):215. doi: 10.3390/dj13050215.
4
Assessment of ChatGPT's adherence to evidence-based clinical practice guidelines for plantar fasciitis management.评估ChatGPT对足底筋膜炎治疗循证临床实践指南的遵循情况。
J Orthop Surg Res. 2025 Apr 30;20(1):434. doi: 10.1186/s13018-025-05831-y.
5
Large Language Models in Action: Supporting Clinical Evaluation in an Infectious Disease Unit.大语言模型的实际应用:支持传染病科室的临床评估
Healthcare (Basel). 2025 Apr 11;13(8):879. doi: 10.3390/healthcare13080879.
6
Large Language Models Do Not Resolve Controversies Regarding Infective Endocarditis Prophylaxis.大语言模型无法解决关于感染性心内膜炎预防的争议。
Int Dent J. 2025 Jun;75(3):1508-1509. doi: 10.1016/j.identj.2025.03.001. Epub 2025 Mar 22.
7
The Transformative Role of Artificial Intelligence in Dentistry: A Comprehensive Overview. Part 1: Fundamentals of AI, and its Contemporary Applications in Dentistry.人工智能在牙科领域的变革性作用:全面概述。第1部分:人工智能基础及其在牙科领域的当代应用。
Int Dent J. 2025 Apr;75(2):383-396. doi: 10.1016/j.identj.2025.02.005. Epub 2025 Mar 11.
8
Chat Generative Pre-Trained Transformer (ChatGPT) in Oral and Maxillofacial Surgery: A Narrative Review on Its Research Applications and Limitations.口腔颌面外科中的聊天生成预训练变换器(ChatGPT):关于其研究应用和局限性的叙述性综述
J Clin Med. 2025 Feb 18;14(4):1363. doi: 10.3390/jcm14041363.
抗生素预防用药与侵袭性牙科操作后感染性心内膜炎发病率:系统评价与荟萃分析。
JAMA Cardiol. 2024 Jul 1;9(7):599-610. doi: 10.1001/jamacardio.2024.0873.
4
Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study.聊天机器人与临床医生回答儿科牙科问题的准确性和一致性:一项试点研究。
J Dent. 2024 May;144:104938. doi: 10.1016/j.jdent.2024.104938. Epub 2024 Apr 3.
5
A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.对基准生物医学文本处理任务中大型语言模型的全面评估。
Comput Biol Med. 2024 Mar;171:108189. doi: 10.1016/j.compbiomed.2024.108189. Epub 2024 Feb 20.
6
Performance of Generative Artificial Intelligence in Dental Licensing Examinations.生成式人工智能在牙科执业考试中的表现。
Int Dent J. 2024 Jun;74(3):616-621. doi: 10.1016/j.identj.2023.12.007. Epub 2024 Jan 19.
7
Evaluation of ChatGPT and Google Bard Using Prompt Engineering in Cancer Screening Algorithms.利用提示工程评估癌症筛查算法中的 ChatGPT 和 Google Bard。
Acad Radiol. 2024 May;31(5):1799-1804. doi: 10.1016/j.acra.2023.11.002. Epub 2023 Dec 15.
8
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
9
2023 ESC Guidelines for the management of endocarditis.2023年欧洲心脏病学会感染性心内膜炎管理指南。
Eur Heart J. 2023 Oct 14;44(39):3948-4042. doi: 10.1093/eurheartj/ehad193.
10
Prophylactic antibiotic use for infective endocarditis: a systematic review and meta-analysis.预防感染性心内膜炎的抗生素使用:系统评价和荟萃分析。
BMJ Open. 2023 Aug 22;13(8):e077026. doi: 10.1136/bmjopen-2023-077026.