在术后牙科护理中利用大语言模型：嵌入式GPT模型与ChatGPT的比较

Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.

作者信息

Batool Itrat, Naved Nighat, Kazmi Syed Murtaza Raza, Umer Fahad

机构信息

Section of Dentistry, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.

出版信息

BDJ Open. 2024 Jun 12;10(1):48. doi: 10.1038/s41405-024-00226-3.

DOI:10.1038/s41405-024-00226-3

PMID:38866751

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11169374/

Abstract

OBJECTIVE

This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.

MATERIAL AND METHODS

An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.

RESULTS

The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.

CONCLUSION

In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.

摘要

目的

本研究强调了人工智能（AI）在医疗保健中的变革性作用，特别是大语言模型（LLMs）在术后牙科护理提供方面的应用前景。目的是评估嵌入式GPT模型的性能及其与ChatGPT-3.5 turbo的比较。评估重点在于回答患者问题和促进明智决策时的响应准确性、清晰度、相关性和最新知识等方面。

材料与方法

通过GPT训练器构建了一个采用GPT-3.5-16k的嵌入式GPT模型，以回答包括口腔手术与牙髓病学、牙周病学、口腔颌面外科和修复学在内的四个牙科专业的术后问题。由三十六位牙科专家（每个专业九位）使用李克特量表对生成的回答进行验证，从而全面了解嵌入式GPT模型的性能及其与GPT3.5 turbo的比较。对于内容验证，使用了定量内容效度指数（CVI）。CVI在项目层面（I-CVI）和量表层面（S-CVI/Ave）进行计算。为了调整I-CVI以考虑机遇一致性，计算了修正的kappa统计量（K*）。

结果

通过嵌入式GPT模型和ChatGPT生成的回答的总体内容效度分别为65.62%和61.87%。此外，嵌入式GPT模型表现出优于ChatGPT的性能，准确率为62.5%，清晰度为72.5%。相比之下，ChatGPT生成的回答得分略低，准确率为52.5%，清晰度为67.5%。然而，两个模型在相关性和最新知识方面表现相当。

结论

总之，与ChatGPT相比，嵌入式GPT模型在提供术后牙科护理方面显示出更好的结果，强调了嵌入和提示工程的好处，为医疗保健应用的未来发展铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c32/11169374/0d1cec645ea9/41405_2024_226_Fig1_HTML.jpg

相似文献

Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.在术后牙科护理中利用大语言模型：嵌入式GPT模型与ChatGPT的比较

BDJ Open. 2024 Jun 12;10(1):48. doi: 10.1038/s41405-024-00226-3.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断：对流行的大型语言模型的定性研究。

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能（AI）的大语言模型在标准化测试中的表现；对人工智能辅助牙科教育的启示。

J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.

Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用：比较研究。

J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较：评估研究。

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Exploring the use of ChatGPT/GPT-4 for patient follow-up after oral surgeries.探讨使用 ChatGPT/GPT-4 进行口腔手术后的患者随访。

Int J Oral Maxillofac Surg. 2024 Oct;53(10):867-872. doi: 10.1016/j.ijom.2024.04.002. Epub 2024 Apr 24.

Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy : The perspective of the European Spine Study Group.大语言模型提供的信息在对患者进行青少年特发性脊柱侧凸教育方面是否有效？内容、清晰度和同理心的评估：欧洲脊柱研究小组的观点

Spine Deform. 2025 Mar;13(2):361-372. doi: 10.1007/s43390-024-00955-3. Epub 2024 Nov 4.

Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use.评估人工智能语言模型在提供甲氨蝶呤使用信息方面的准确性和完整性。

Rheumatol Int. 2024 Mar;44(3):509-515. doi: 10.1007/s00296-023-05473-5. Epub 2023 Sep 25.

Stratified Evaluation of GPT's Question Answering in Surgery Reveals Artificial Intelligence (AI) Knowledge Gaps.对GPT在外科手术中问答的分层评估揭示了人工智能（AI）的知识差距。

Cureus. 2023 Nov 14;15(11):e48788. doi: 10.7759/cureus.48788. eCollection 2023 Nov.

引用本文的文献

Assessing the power of AI: a comparative evaluation of large language models in generating patient education materials in dentistry.评估人工智能的能力：大型语言模型在生成牙科患者教育材料方面的比较评估。

BDJ Open. 2025 Jun 18;11(1):59. doi: 10.1038/s41405-025-00349-1.

Can a large language model create acceptable dental board-style examination questions? A cross-sectional prospective study.大型语言模型能否创建可接受的牙科学术委员会风格的考试问题？一项横断面前瞻性研究。

J Dent Sci. 2025 Apr;20(2):895-900. doi: 10.1016/j.jds.2024.08.020. Epub 2024 Sep 11.

The Transformative Role of Artificial Intelligence in Dentistry: A Comprehensive Overview Part 2: The Promise and Perils, and the International Dental Federation Communique.人工智能在牙科领域的变革性作用：全面概述第2部分：前景与风险，以及国际牙科联合会公报

Int Dent J. 2025 Apr;75(2):397-404. doi: 10.1016/j.identj.2025.02.006. Epub 2025 Feb 25.

Chat Generative Pre-Trained Transformer (ChatGPT) in Oral and Maxillofacial Surgery: A Narrative Review on Its Research Applications and Limitations.口腔颌面外科中的聊天生成预训练变换器（ChatGPT）：关于其研究应用和局限性的叙述性综述

J Clin Med. 2025 Feb 18;14(4):1363. doi: 10.3390/jcm14041363.

Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study.最新大语言模型在回答牙科多项选择题方面的准确性：一项比较研究。

PLoS One. 2025 Jan 29;20(1):e0317423. doi: 10.1371/journal.pone.0317423. eCollection 2025.

Innovation and application of Large Language Models (LLMs) in dentistry - a scoping review.大型语言模型在牙科领域的创新与应用——一项范围综述

BDJ Open. 2024 Dec 1;10(1):90. doi: 10.1038/s41405-024-00277-6.

本文引用的文献

Content analysis of AI-generated (ChatGPT) responses concerning orthodontic clear aligners.人工智能（ChatGPT）生成的有关正畸透明矫正器的回复的内容分析。

Angle Orthod. 2024 May 1;94(3):263-272. doi: 10.2319/071123-484.1.

A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging.基于GPT 4的内容感知聊天机器人为牙科成像中的锥形束CT指南提供可靠建议。

Dentomaxillofac Radiol. 2024 Feb 8;53(2):109-114. doi: 10.1093/dmfr/twad015.

Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery.超越手术刀：评估ChatGPT作为口腔外科辅助智能虚拟助手的潜力。

Comput Struct Biotechnol J. 2023 Dec 6;24:46-52. doi: 10.1016/j.csbj.2023.11.058. eCollection 2024 Dec.

Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics.人工智能聊天机器人作为牙髓学公共信息源的有效性和可靠性。

Int Endod J. 2024 Mar;57(3):305-314. doi: 10.1111/iej.14014. Epub 2023 Dec 20.

Potential Use of ChatGPT for Patient Information in Periodontology: A Descriptive Pilot Study.ChatGPT在牙周病学患者信息方面的潜在应用：一项描述性试点研究。

Cureus. 2023 Nov 8;15(11):e48518. doi: 10.7759/cureus.48518. eCollection 2023 Nov.

Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.比较分析 ChatGPT-3.5、ChatGPT-4.0 和谷歌巴德在近视防控方面的表现：大型语言模型的基准测试。

EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.

Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions.人工智能与公共卫生：评估ChatGPT对疫苗接种谣言和误解的回应

Vaccines (Basel). 2023 Jul 7;11(7):1217. doi: 10.3390/vaccines11071217.

ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model.ChatGPT 塑造牙科的未来：多模态大语言模型的潜力。

Int J Oral Sci. 2023 Jul 28;15(1):29. doi: 10.1038/s41368-023-00239-y.

Large language models in medicine.医学中的大型语言模型。

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet?评估大语言模型在回答常见患者胃肠道健康相关问题中的效用：我们做到了吗？

Diagnostics (Basel). 2023 Jun 2;13(11):1950. doi: 10.3390/diagnostics13111950.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在术后牙科护理中利用大语言模型：嵌入式GPT模型与ChatGPT的比较

Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIAL AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献