Batool Itrat, Naved Nighat, Kazmi Syed Murtaza Raza, Umer Fahad
Section of Dentistry, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.
BDJ Open. 2024 Jun 12;10(1):48. doi: 10.1038/s41405-024-00226-3.
This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.
An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.
The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.
In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.
本研究强调了人工智能(AI)在医疗保健中的变革性作用,特别是大语言模型(LLMs)在术后牙科护理提供方面的应用前景。目的是评估嵌入式GPT模型的性能及其与ChatGPT-3.5 turbo的比较。评估重点在于回答患者问题和促进明智决策时的响应准确性、清晰度、相关性和最新知识等方面。
通过GPT训练器构建了一个采用GPT-3.5-16k的嵌入式GPT模型,以回答包括口腔手术与牙髓病学、牙周病学、口腔颌面外科和修复学在内的四个牙科专业的术后问题。由三十六位牙科专家(每个专业九位)使用李克特量表对生成的回答进行验证,从而全面了解嵌入式GPT模型的性能及其与GPT3.5 turbo的比较。对于内容验证,使用了定量内容效度指数(CVI)。CVI在项目层面(I-CVI)和量表层面(S-CVI/Ave)进行计算。为了调整I-CVI以考虑机遇一致性,计算了修正的kappa统计量(K*)。
通过嵌入式GPT模型和ChatGPT生成的回答的总体内容效度分别为65.62%和61.87%。此外,嵌入式GPT模型表现出优于ChatGPT的性能,准确率为62.5%,清晰度为72.5%。相比之下,ChatGPT生成的回答得分略低,准确率为52.5%,清晰度为67.5%。然而,两个模型在相关性和最新知识方面表现相当。
总之,与ChatGPT相比,嵌入式GPT模型在提供术后牙科护理方面显示出更好的结果,强调了嵌入和提示工程的好处,为医疗保健应用的未来发展铺平了道路。