评估聊天生成预训练变换器对有关先天性上肢差异的常见患者问题的回答准确性。

Assessing Accuracy of Chat Generative Pre-Trained Transformer's Responses to Common Patient Questions Regarding Congenital Upper Limb Differences.

作者信息

Zeller Niklaus P, Shah Ayush D, Van Heest Ann E, Bohn Deborah C

机构信息

University of Minnesota Medical School, Minneapolis, MN.

Department of Orthopedic Surgery, University of Minnesota, Minneapolis, MN.

出版信息

J Hand Surg Glob Online. 2025 May 31;7(4):100764. doi: 10.1016/j.jhsg.2025.100764. eCollection 2025 Jul.

DOI:10.1016/j.jhsg.2025.100764

PMID:40520541

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12164003/

Abstract

PURPOSE

The purpose was to assess the ability of Chat Generative Pre-Trained Transformer (ChatGPT) 4.0 to accurately and reliably answer patients' frequently asked questions (FAQs) about congenital upper limb differences (CULDs) and their treatment options.

METHODS

Two pediatric hand surgeons were queried regarding FAQs they receive from parents about CULDs. Sixteen FAQs were input to ChatGPT-4.0 for the following conditions: (1) syndactyly, (2) polydactyly, (3) radial longitudinal deficiency, (4) thumb hypoplasia, and (5) general congenital hand differences. Two additional psychosocial care questions were queried, and all responses were graded by the surgeons using a scale of 1-4, based on the quality of the response. Independent chats were used for each question to reduce memory-retention bias with no pretraining of the software application.

RESULTS

Overall, ChatGPT provided relatively reliable, evidence-based responses to the 16 queried FAQs. In total, 164 grades were assigned to the 82 ChatGPT responses: 83 (51%) did not require any clarification, 37 (23%) required minimal clarification, 32 (20%) required moderate clarification, and 13 (8%) received an unsatisfactory rating. However, there was considerable variability in the depth of many responses. When queried on medical associations with syndactyly and polydactyly, ChatGPT provided a detailed account of associated syndromes, although there was no mention that syndromic involvement is relatively rare. Furthermore, ChatGPT recommended that the patients consult a health care provider for individualized care 81 times in 49 responses. It commonly "referred" patients to genetic counselors (n = 26, 32%), followed by pediatric orthopedic surgeons and orthopedic surgeons (n = 16, 20%), and hand surgeons (n = 9, 11%).

CONCLUSIONS

Chat Generative Pre-Trained Transformer provided evidence-based responses not requiring clarification to a majority of FAQs about CULDs. However, there was considerable variation across the responses, and it rarely "referred" patients to hand surgeons. As new tools for patient education, ChatGPT and similar large language models should be approached cautiously when seeking information about CULDs. Responses do not consistently provide comprehensive, individualized information. 8% of responses were misguiding.

TYPE OF STUDY/LEVEL OF EVIDENCE: Economic/decision analysis IIC.

摘要

目的

评估聊天生成预训练变换器（ChatGPT）4.0准确、可靠地回答患者关于先天性上肢差异（CULD）及其治疗选择的常见问题（FAQ）的能力。

方法

向两位小儿手外科医生询问他们从家长那里收到的关于CULD的常见问题。将16个常见问题输入ChatGPT-4.0，针对以下情况：（1）并指畸形，（2）多指畸形，（3）桡侧纵列发育不全，（4）拇指发育不全，以及（5）一般先天性手部差异。还询问了另外两个心理社会护理问题，所有回答由外科医生根据回答质量按1-4分的等级进行评分。每个问题使用独立聊天以减少记忆保留偏差，且软件应用程序未进行预训练。

结果

总体而言，ChatGPT对16个询问的常见问题提供了相对可靠、基于证据的回答。总共对82个ChatGPT回答给出了164个评分：83个（51%）不需要任何澄清，37个（23%）需要最少的澄清，32个（20%）需要适度的澄清，13个（8%）得到不满意的评分。然而，许多回答的深度存在相当大的差异。当询问与并指畸形和多指畸形相关的医学关联时，ChatGPT详细说明了相关综合征，尽管未提及综合征性受累相对罕见。此外，ChatGPT在49个回答中81次建议患者咨询医疗保健提供者以获得个性化护理。它通常“推荐”患者咨询遗传咨询师（n = 26，32%），其次是小儿骨科医生和骨科医生（n = 16，20%），以及手外科医生（n = 9，11%）。

结论

聊天生成预训练变换器对大多数关于CULD的常见问题提供了无需澄清的基于证据的回答。然而，回答之间存在相当大的差异，并且它很少“推荐”患者咨询手外科医生。作为患者教育的新工具，在寻求关于CULD的信息时，应谨慎对待ChatGPT和类似的大语言模型。回答并不始终提供全面、个性化的信息。8%的回答具有误导性。

研究类型/证据水平：经济/决策分析II C。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估聊天生成预训练变换器对有关先天性上肢差异的常见患者问题的回答准确性。

Assessing Accuracy of Chat Generative Pre-Trained Transformer's Responses to Common Patient Questions Regarding Congenital Upper Limb Differences.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献

评估聊天生成预训练变换器对有关先天性上肢差异的常见患者问题的回答准确性。

Assessing Accuracy of Chat Generative Pre-Trained Transformer's Responses to Common Patient Questions Regarding Congenital Upper Limb Differences.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献