Suppr超能文献

在术后牙科护理中利用大语言模型:嵌入式GPT模型与ChatGPT的比较

Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.

作者信息

Batool Itrat, Naved Nighat, Kazmi Syed Murtaza Raza, Umer Fahad

机构信息

Section of Dentistry, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.

出版信息

BDJ Open. 2024 Jun 12;10(1):48. doi: 10.1038/s41405-024-00226-3.

Abstract

OBJECTIVE

This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.

MATERIAL AND METHODS

An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.

RESULTS

The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.

CONCLUSION

In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.

摘要

目的

本研究强调了人工智能(AI)在医疗保健中的变革性作用,特别是大语言模型(LLMs)在术后牙科护理提供方面的应用前景。目的是评估嵌入式GPT模型的性能及其与ChatGPT-3.5 turbo的比较。评估重点在于回答患者问题和促进明智决策时的响应准确性、清晰度、相关性和最新知识等方面。

材料与方法

通过GPT训练器构建了一个采用GPT-3.5-16k的嵌入式GPT模型,以回答包括口腔手术与牙髓病学、牙周病学、口腔颌面外科和修复学在内的四个牙科专业的术后问题。由三十六位牙科专家(每个专业九位)使用李克特量表对生成的回答进行验证,从而全面了解嵌入式GPT模型的性能及其与GPT3.5 turbo的比较。对于内容验证,使用了定量内容效度指数(CVI)。CVI在项目层面(I-CVI)和量表层面(S-CVI/Ave)进行计算。为了调整I-CVI以考虑机遇一致性,计算了修正的kappa统计量(K*)。

结果

通过嵌入式GPT模型和ChatGPT生成的回答的总体内容效度分别为65.62%和61.87%。此外,嵌入式GPT模型表现出优于ChatGPT的性能,准确率为62.5%,清晰度为72.5%。相比之下,ChatGPT生成的回答得分略低,准确率为52.5%,清晰度为67.5%。然而,两个模型在相关性和最新知识方面表现相当。

结论

总之,与ChatGPT相比,嵌入式GPT模型在提供术后牙科护理方面显示出更好的结果,强调了嵌入和提示工程的好处,为医疗保健应用的未来发展铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c32/11169374/0d1cec645ea9/41405_2024_226_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验