Suppr超能文献

基于GPT-4的口腔面部疼痛临床决策支持系统的开发与评估

Development and Evaluation of a GPT4-Based Orofacial Pain Clinical Decision Support System.

作者信息

Vueghs Charlotte, Shakeri Hamid, Renton Tara, Van der Cruyssen Frederic

机构信息

Department of Oral and Maxillofacial Surgery, University Hospitals Leuven, 3000 Leuven, Belgium.

Department of Oral Surgery, King's College London Dental Institute, London SE5 9RW, UK.

出版信息

Diagnostics (Basel). 2024 Dec 17;14(24):2835. doi: 10.3390/diagnostics14242835.

Abstract

: Orofacial pain (OFP) encompasses a complex array of conditions affecting the face, mouth, and jaws, often leading to significant diagnostic challenges and high rates of misdiagnosis. Artificial intelligence, particularly large language models like GPT4 (OpenAI, San Francisco, CA, USA), offers potential as a diagnostic aid in healthcare settings. : To evaluate the diagnostic accuracy of GPT4 in OFP cases as a clinical decision support system (CDSS) and compare its performance against treating clinicians, expert evaluators, medical students, and general practitioners. : A total of 100 anonymized patient case descriptions involving diverse OFP conditions were collected. GPT4 was prompted to generate primary and differential diagnoses for each case using the International Classification of Orofacial Pain (ICOP) criteria. Diagnoses were compared to gold-standard diagnoses established by treating clinicians, and a scoring system was used to assess accuracy at three hierarchical ICOP levels. A subset of 24 cases was also evaluated by two clinical experts, two final-year medical students, and two general practitioners for comparative analysis. Diagnostic performance and interrater reliability were calculated. : GPT4 achieved the highest accuracy level (ICOP level 3) in 38% of cases, with an overall diagnostic performance score of 157 out of 300 points (52%). The model provided accurate differential diagnoses in 80% of cases (400 out of 500 points). In the subset of 24 cases, the model's performance was comparable to non-expert human evaluators but was surpassed by clinical experts, who correctly diagnosed 54% of cases at level 3. GPT4 demonstrated high accuracy in specific categories, correctly diagnosing 81% of trigeminal neuralgia cases at level 3. Interrater reliability between GPT4 and human evaluators was low (κ = 0.219, < 0.001), indicating variability in diagnostic agreement. GPT4 shows promise as a CDSS for OFP by improving diagnostic accuracy and offering structured differential diagnoses. While not yet outperforming expert clinicians, GPT4 can augment diagnostic workflows, particularly in primary care or educational settings. Effective integration into clinical practice requires adherence to rigorous guidelines, thorough validation, and ongoing professional oversight to ensure patient safety and diagnostic reliability.

摘要

口面部疼痛(OFP)涵盖了一系列影响面部、口腔和颌骨的复杂病症,常常带来重大的诊断挑战以及较高的误诊率。人工智能,尤其是像GPT4(美国加利福尼亚州旧金山的OpenAI公司)这样的大型语言模型,在医疗环境中作为诊断辅助工具具有潜力。

为了评估GPT4在OFP病例中作为临床决策支持系统(CDSS)的诊断准确性,并将其表现与主治医生、专家评估人员、医学生和全科医生进行比较。

总共收集了100份涉及多种OFP病症的匿名患者病例描述。使用国际口面部疼痛分类(ICOP)标准促使GPT4为每个病例生成初步诊断和鉴别诊断。将诊断结果与主治医生确立的金标准诊断进行比较,并使用评分系统在ICOP的三个层次水平上评估准确性。还由两名临床专家、两名医学专业最后一年的学生和两名全科医生对24个病例的子集进行评估以进行比较分析。计算诊断性能和评分者间信度。

GPT4在38%的病例中达到了最高准确性水平(ICOP 3级),总体诊断性能得分为300分中的157分(52%)。该模型在80%的病例中提供了准确的鉴别诊断(500分中的400分)。在24个病例的子集中,该模型的表现与非专家人类评估者相当,但被临床专家超越,临床专家在3级水平上正确诊断了54%的病例。GPT4在特定类别中表现出高准确性,在3级水平上正确诊断了81%的三叉神经痛病例。GPT4与人类评估者之间的评分者间信度较低(κ = 0.219,P < 0.001),表明诊断一致性存在差异。GPT4通过提高诊断准确性和提供结构化的鉴别诊断,显示出作为OFP的CDSS的前景。虽然尚未超越专家临床医生,但GPT4可以增强诊断工作流程,特别是在初级保健或教育环境中。有效整合到临床实践中需要遵循严格的指南、进行全面验证并持续进行专业监督,以确保患者安全和诊断可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fab0/11674870/f334c575cb2d/diagnostics-14-02835-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验