大语言模型在手部和周围神经损伤诊断中的应用：ChatGPT与伊莎贝尔鉴别诊断生成器的评估

Large Language Models in the Diagnosis of Hand and Peripheral Nerve Injuries: An Evaluation of ChatGPT and the Isabel Differential Diagnosis Generator.

作者信息

AlShenaiber Abdullah, Datta Shaishav, Mosa Adam J, Binhammer Paul A, Ing Edsel B

机构信息

Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.

Division of Plastic, Reconstructive & Aesthetic Surgery, Department of Surgery, University of Toronto, Toronto, ON, Canada.

出版信息

J Hand Surg Glob Online. 2024 Sep 3;6(6):847-854. doi: 10.1016/j.jhsg.2024.07.011. eCollection 2024 Nov.

DOI:10.1016/j.jhsg.2024.07.011

PMID:39703593

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11652307/

Abstract

PURPOSE

Tools using artificial intelligence may help reduce missed or delayed diagnoses and improve patient care in hand surgery. This study aimed to compare and evaluate the performance of two natural language processing programs, Isabel and ChatGPT-4, in diagnosing hand and peripheral nerve injuries from a set of clinical vignettes.

METHODS

Cases from a virtual library of hand surgery case reports with no history of trauma or previous surgery were included in this study. The clinical details (age, sex, symptoms, signs, and medical history) of 16 hand cases were entered into Isabel and ChatGPT-4 to generate top 10 differential diagnosis lists. Isabel and ChatGPT-4's inclusion and median rank of the correct diagnosis within each list were compared. Two hand surgeons were then provided each list and asked to independently evaluate the performance of the two systems.

RESULTS

Isabel correctly identified 7/16 (44%) cases with a median rank of two (interquartile range = 3). ChatGPT-4 correctly identified 14/16 (88%) of cases with a median rank of one (interquartile range = 1). Physicians one and two, respectively, preferred the lists generated by ChatGPT-4 in 12/16 (75%) and 13/16 (81%) of cases and had no preference in 2/16 (13%) cases.

CONCLUSIONS

ChatGPT-4 had significantly greater diagnostic accuracy within our sample ( < .05) and generated higher quality differential diagnoses than Isabel. Isabel produced several inappropriate and imprecise differential diagnoses.

CLINICAL RELEVANCE

Despite large language models' potential utility in generating medical diagnoses, physicians must continue to exercise high caution and use their clinical judgment when making diagnostic decisions.

摘要

目的

使用人工智能的工具可能有助于减少手部手术中漏诊或误诊的情况，并改善患者护理。本研究旨在比较和评估两种自然语言处理程序Isabel和ChatGPT-4在根据一组临床病例诊断手部和周围神经损伤方面的性能。

方法

本研究纳入了来自手部手术病例报告虚拟库的病例，这些病例无创伤史或既往手术史。将16例手部病例的临床细节（年龄、性别、症状、体征和病史）输入Isabel和ChatGPT-4，以生成前10名的鉴别诊断列表。比较Isabel和ChatGPT-4在每个列表中正确诊断的纳入情况和中位排名。然后向两名手外科医生提供每个列表，并要求他们独立评估这两个系统的性能。

结果

Isabel正确识别了7/16（44%）的病例，中位排名为第二（四分位间距 = 3）。ChatGPT-4正确识别了14/16（88%）的病例，中位排名为第一（四分位间距 = 1）。医生一和医生二分别在12/16（75%）和13/16（81%）的病例中更喜欢ChatGPT-4生成的列表，在2/16（13%）的病例中没有偏好。

结论

在我们的样本中，ChatGPT-4具有显著更高的诊断准确性（P <.05），并且比Isabel生成了更高质量的鉴别诊断。Isabel产生了一些不恰当和不准确的鉴别诊断。

临床相关性

尽管大语言模型在生成医学诊断方面具有潜在效用，但医生在做出诊断决策时必须继续高度谨慎并运用临床判断力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

大语言模型在手部和周围神经损伤诊断中的应用：ChatGPT与伊莎贝尔鉴别诊断生成器的评估

Large Language Models in the Diagnosis of Hand and Peripheral Nerve Injuries: An Evaluation of ChatGPT and the Isabel Differential Diagnosis Generator.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

CLINICAL RELEVANCE

目的

方法

结果

结论

临床相关性

相似文献

本文引用的文献

大语言模型在手部和周围神经损伤诊断中的应用：ChatGPT与伊莎贝尔鉴别诊断生成器的评估

Large Language Models in the Diagnosis of Hand and Peripheral Nerve Injuries: An Evaluation of ChatGPT and the Isabel Differential Diagnosis Generator.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

CLINICAL RELEVANCE

目的

方法

结果

结论

临床相关性

相似文献

本文引用的文献