评估大语言模型自我诊断口腔疾病的能力。

Evaluation of the ability of large language models to self-diagnose oral diseases.

作者信息

Zhuang Shiyang, Zeng Yuanhao, Lin Shaojunjie, Chen Xirui, Xin Yishan, Li Hongyan, Lin Yiming, Zhang Chaofan, Lin Yunzhi

机构信息

Department of Stomatology, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China.

Department of Stomatology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China.

出版信息

iScience. 2024 Nov 29;27(12):111495. doi: 10.1016/j.isci.2024.111495. eCollection 2024 Dec 20.

DOI:10.1016/j.isci.2024.111495

PMID:39758998

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11699252/

Abstract

Large language models (LLMs) offer potential in primary dental care. We conducted an evaluation of LLMs' diagnostic capabilities across various oral diseases and contexts. All LLMs showed diagnostic capabilities for temporomandibular joint disorders, periodontal disease, dental caries, and malocclusion. The prompts did not affect the performance of ChatGPT 3.5. When Chinese was used, the diagnostic ability of ChatGPT 3.5 for pulpitis improved (0% vs. 61.7%, < 0.001), while the ability to diagnose pericoronitis decreased (8% vs. 0%, < 0.001). For ChatGPT 4.0 in Chinese, they were both improved (0% vs. 92%, 8% vs. 72%, < 0.001, respectively). Claude 2 exhibited the highest accuracy in diagnosing pulpitis (36%, = 0.048), ChatGPT 4.0 showed complete diagnostic capability for pericoronitis. Llama 2 and Claude 3.5 Sonnet exhibited complete diagnostic capability for oral cancer. In conclusion, LLMs may be a potential tool for daily dental care but need further updates.

摘要

大语言模型（LLMs）在初级牙科护理中具有潜力。我们对LLMs在各种口腔疾病和情境下的诊断能力进行了评估。所有LLMs在颞下颌关节紊乱、牙周疾病、龋齿和错牙合畸形方面均表现出诊断能力。提示不会影响ChatGPT 3.5的性能。使用中文时，ChatGPT 3.5对牙髓炎的诊断能力有所提高（从0%提高到61.7%，P<0.001），而诊断冠周炎的能力下降（从8%降至0%，P<0.001）。对于中文的ChatGPT 4.0，这两项能力均有所提高（分别从0%提高到92%，从8%提高到72%，P<0.001）。Claude 2在诊断牙髓炎方面表现出最高的准确率（36%，P = 0.048），ChatGPT 4.0对冠周炎表现出完全的诊断能力。Llama 2和Claude 3.5 Sonnet对口腔癌表现出完全的诊断能力。总之，LLMs可能是日常牙科护理的一种潜在工具，但需要进一步更新。