文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

人工智能与住院医师在口腔病变诊断中的比较:一项对比研究。

Comparing Artificial Intelligence and Senior Residents in Oral Lesion Diagnosis: A Comparative Study.

作者信息

Albagieh Hamad, Alzeer Zaid O, Alasmari Osama N, Alkadhi Abdullah A, Naitah Abdulaziz N, Almasaad Khaled F, Alshahrani Turki S, Alshahrani Khalid S, Almahmoud Mohammed I

机构信息

Oral Medicine, King Saud University, Riyadh, SAU.

Dentistry, College of Dentistry, King Saud University, Riyadh, SAU.

出版信息

Cureus. 2024 Jan 3;16(1):e51584. doi: 10.7759/cureus.51584. eCollection 2024 Jan.


DOI:10.7759/cureus.51584
PMID:38173951
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10763647/
Abstract

INTRODUCTION: Artificial intelligence (AI) is a field of computer science that seeks to build intelligent machines that can carry out tasks that usually necessitate human intelligence. AI may help dentists with a variety of dental tasks, including clinical diagnosis and treatment planning. This study aims to compare the performance of AI and oral medicine residents in diagnosing different cases, providing treatment, and determining if it is reliable to assist them in their field of work. METHODS: The study conducted a comparative analysis of the responses from third- and fourth-year residents trained in Oral Medicine and Pathology at King Saud University, College of Dentistry. The residents were given a closed multiple-choice test consisting of 19 questions with four response options labeled A-D and one question with five response options labeled A-E. The test was administered via Google Forms, and each resident's response was stored electronically in an Excel sheet (Microsoft® Corp., Redmond, WA). The residents' answers were then compared to the responses generated by three major language models: OpenAI, Stablediffusion, and PopAI. The questions were inputted into the language models in the same format as the original test, and prior to each question, an artificial intelligence chat session was created to eliminate memory retention bias. The input was done on November 19, 2023, the same day the official multiple-choice test was administered. The study had a sample size of 20 residents trained in Oral Medicine and Pathology at King Saud University, College of Dentistry, consisting of both third-year and fourth-year residents. RESULT: The responses of three large language models (LLM), including OpenAI, Stablediffusion, and PopAI, as well as the responses of 20 senior residents for 20 clinical cases about oral lesion diagnosis. There were no significant variations observed for the remaining questions in the responses to only two questions (10%). For the remaining questions, there were no significant differences. The median (IQR) score of LLMs was 50.0 (45.0 to 60.0), with a minimum of 40 (for stable diffusion) and a maximum of 70 (for OpenAI). The median (IQR) score of senior residents was 65.0 (55.0-75.0). The highest and lowest scores of residents were 40 and 90, respectively. There was no significant difference in the percent scores of residents and LLMs (p = 0.211). The agreement level was measured using the Kappa value. The agreement among senior dental residents was observed to be weak, with a Kappa value of 0.396. In contrast, the agreement among LLMs demonstrated a moderate level, with a Kappa value of 0.622, suggesting a more cohesive alignment in responses among the artificial intelligence models. When comparing residents' responses with those generated by different OpenAI models, including OpenAI, Stablediffusion, and PopAI, the agreement levels were consistently categorized as weak, with Kappa values of 0.402, 0.381, and 0.392, respectively. CONCLUSION: What the current study reveals is that when comparing the response score, there is no significant difference, in contrast to the agreement analysis among the residents, which was low compared to the LLMs, in which it was high. Dentists should consider that AI is very beneficial in providing diagnosis and treatment and use it to assist them.

摘要

引言:人工智能(AI)是计算机科学的一个领域,旨在构建能够执行通常需要人类智能的任务的智能机器。人工智能可以在各种牙科任务中帮助牙医,包括临床诊断和治疗计划。本研究旨在比较人工智能和口腔医学住院医师在诊断不同病例、提供治疗以及确定其在工作领域提供协助是否可靠方面的表现。 方法:该研究对沙特国王大学牙科学院接受口腔医学和病理学培训的三年级和四年级住院医师的回答进行了比较分析。住院医师接受了一项封闭式多项选择题测试,该测试由19个有A - D四个选项的问题和1个有A - E五个选项的问题组成。测试通过谷歌表单进行,每个住院医师的回答以电子方式存储在Excel工作表(微软公司,华盛顿州雷德蒙德)中。然后将住院医师的答案与三个主要语言模型生成的回答进行比较:OpenAI、Stablediffusion和PopAI。问题以与原始测试相同的格式输入到语言模型中,并且在每个问题之前创建一个人工智能聊天会话以消除记忆保留偏差。输入操作于2023年11月19日进行,即官方多项选择题测试的同一天。该研究的样本量为20名在沙特国王大学牙科学院接受口腔医学和病理学培训的住院医师,包括三年级和四年级住院医师。 结果:三个大型语言模型(LLM),即OpenAI、Stablediffusion和PopAI的回答,以及20名高级住院医师对20个口腔病变诊断临床病例的回答。在仅两个问题(10%)的回答中,其余问题未观察到显著差异。对于其余问题,没有显著差异。语言模型的中位数(IQR)分数为50.0(45.0至60.0),最低为40(Stablediffusion),最高为70(OpenAI)。高级住院医师的中位数(IQR)分数为65.0(55.0 - 75.0)。住院医师的最高和最低分数分别为40和90。住院医师和语言模型的百分比分数没有显著差异(p = 0.211)。使用Kappa值测量一致性水平。观察到牙科高级住院医师之间的一致性较弱,Kappa值为0.396。相比之下,语言模型之间的一致性表现为中等水平,Kappa值为0.622,这表明人工智能模型之间的回答具有更强的一致性。当将住院医师的回答与不同的OpenAI模型(包括OpenAI、Stablediffusion和PopAI)生成的回答进行比较时,一致性水平始终被归类为较弱,Kappa值分别为0.402、0.38I和0.392。 结论:当前研究表明,在比较回答分数时,没有显著差异,与之形成对比的是,住院医师之间的一致性分析较低,而语言模型之间的一致性较高。牙医应考虑到人工智能在提供诊断和治疗方面非常有益,并利用它来协助自己。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92cd/10763647/df0a82985f7d/cureus-0016-00000051584-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92cd/10763647/ab1e37041f67/cureus-0016-00000051584-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92cd/10763647/df0a82985f7d/cureus-0016-00000051584-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92cd/10763647/ab1e37041f67/cureus-0016-00000051584-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92cd/10763647/df0a82985f7d/cureus-0016-00000051584-i02.jpg

相似文献

[1]
Comparing Artificial Intelligence and Senior Residents in Oral Lesion Diagnosis: A Comparative Study.

Cureus. 2024-1-3

[2]
Sexual Harassment and Prevention Training

2025-1

[3]
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.

Syst Rev. 2024-11-26

[4]
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024-12-1

[5]
Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001

[6]
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].

Epidemiol Prev. 2013

[7]
[Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation].

Zhonghua Kou Qiang Yi Xue Za Zhi. 2025-7-30

[8]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[9]
The Black Book of Psychotropic Dosing and Monitoring.

Psychopharmacol Bull. 2024-7-8

[10]
Sertindole for schizophrenia.

Cochrane Database Syst Rev. 2005-7-20

引用本文的文献

[1]
ChatGpt's accuracy in the diagnosis of oral lesions.

BMC Oral Health. 2025-7-21

[2]
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.

Dent J (Basel). 2025-6-18

[3]
Can Large Language Models Serve as Reliable Tools for Information in Dentistry? A Systematic Review.

Int Dent J. 2025-5-16

[4]
Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches.

J Clin Med. 2024-7-9

本文引用的文献

[1]
ChatGPT's performance in dentistry and allergyimmunology assessments: a comparative study.

Swiss Dent J. 2023-10-4

[2]
The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study.

Cureus. 2023-12-12

[3]
Performance of ChatGPT in Board Examinations for Specialists in the Japanese Ophthalmology Society.

Cureus. 2023-12-4

[4]
ChatGPT's performance in dentistry and allergy-immunology assessments: a comparative study.

Swiss Dent J. 2023-10-6

[5]
The Use of AI in Diagnosing Diseases and Providing Management Plans: A Consultation on Cardiovascular Disorders With ChatGPT.

Cureus. 2023-8-7

[6]
ChatGPT in Dentistry: A Comprehensive Review.

Cureus. 2023-4-30

[7]
ChatGPT for Future Medical and Dental Research.

Cureus. 2023-4-8

[8]
Implications of large language models such as ChatGPT for dental medicine.

J Esthet Restor Dent. 2023-10

[9]
ChatGPT: Is this version good for healthcare and research?

Diabetes Metab Syndr. 2023-4

[10]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索