Frosolini Andrea, Catarzi Lisa, Benedetti Simone, Latini Linda, Chisci Glauco, Franz Leonardo, Gennaro Paolo, Gabriele Guido
Maxillofacial Surgery Unit, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy.
Phoniatris and Audiology Unit, Department of Neuroscience DNS, University of Padova, 35122 Treviso, Italy.
Diagnostics (Basel). 2024 Apr 18;14(8):839. doi: 10.3390/diagnostics14080839.
In the evolving field of maxillofacial surgery, integrating advanced technologies like Large Language Models (LLMs) into medical practices, especially for trauma triage, presents a promising yet largely unexplored potential. This study aimed to evaluate the feasibility of using LLMs for triaging complex maxillofacial trauma cases by comparing their performance against the expertise of a tertiary referral center.
Utilizing a comprehensive review of patient records in a tertiary referral center over a year-long period, standardized prompts detailing patient demographics, injury characteristics, and medical histories were created. These prompts were used to assess the triage suggestions of ChatGPT 4.0 and Google GEMINI against the center's recommendations, supplemented by evaluating the AI's performance using the QAMAI and AIPI questionnaires.
The results in 10 cases of major maxillofacial trauma indicated moderate agreement rates between LLM recommendations and the referral center, with some variances in the suggestion of appropriate examinations (70% ChatGPT and 50% GEMINI) and treatment plans (60% ChatGPT and 45% GEMINI). Notably, the study found no statistically significant differences in several areas of the questionnaires, except in the diagnosis accuracy (GEMINI: 3.30, ChatGPT: 2.30; = 0.032) and relevance of the recommendations (GEMINI: 2.90, ChatGPT: 3.50; = 0.021). A Spearman correlation analysis highlighted significant correlations within the two questionnaires, specifically between the QAMAI total score and AIPI treatment scores (rho = 0.767, = 0.010).
This exploratory investigation underscores the potential of LLMs in enhancing clinical decision making for maxillofacial trauma cases, indicating a need for further research to refine their application in healthcare settings.
在不断发展的颌面外科领域,将大语言模型(LLMs)等先进技术融入医疗实践,尤其是用于创伤分诊,具有很大的潜力,但在很大程度上尚未得到充分探索。本研究旨在通过将大语言模型的表现与三级转诊中心的专业知识进行比较,评估其用于分诊复杂颌面创伤病例的可行性。
通过对一家三级转诊中心长达一年的患者记录进行全面回顾,创建了详细描述患者人口统计学、损伤特征和病史的标准化提示。这些提示用于评估ChatGPT 4.0和谷歌GEMINI的分诊建议与该中心的建议,并通过使用QAMAI和AIPI问卷评估人工智能的表现进行补充。
10例主要颌面创伤病例的结果表明,大语言模型的建议与转诊中心之间的一致率中等,在适当检查建议(ChatGPT为70%,GEMINI为50%)和治疗计划(ChatGPT为60%,GEMINI为45%)方面存在一些差异。值得注意的是,该研究发现,除了诊断准确性(GEMINI:3.30,ChatGPT:2.30;P = 0.032)和建议的相关性(GEMINI:2.90,ChatGPT:3.50;P = 0.021)外,问卷的几个方面没有统计学上的显著差异。Spearman相关性分析突出了两份问卷之间的显著相关性,特别是QAMAI总分与AIPI治疗分数之间(rho = 0.767,P = 0.010)。
这项探索性研究强调了大语言模型在增强颌面创伤病例临床决策方面的潜力,表明需要进一步研究以完善其在医疗环境中的应用。