• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT生成的关于头颈及口腔颌面外科信息的准确性:一项多中心协作分析

Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis.

作者信息

Vaira Luigi Angelo, Lechien Jerome R, Abbate Vincenzo, Allevi Fabiana, Audino Giovanni, Beltramini Giada Anna, Bergonzani Michela, Bolzoni Alessandro, Committeri Umberto, Crimi Salvatore, Gabriele Guido, Lonardi Fabio, Maglitto Fabio, Petrocelli Marzia, Pucci Resi, Saponaro Gianmarco, Tel Alessandro, Vellone Valentino, Chiesa-Estomba Carlos Miguel, Boscolo-Rizzo Paolo, Salzano Giovanni, De Riu Giacomo

机构信息

Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy.

Biomedical Sciences Department, PhD School of Biomedical Science, University of Sassari, Sassari, Italy.

出版信息

Otolaryngol Head Neck Surg. 2024 Jun;170(6):1492-1503. doi: 10.1002/ohn.489. Epub 2023 Aug 18.

DOI:10.1002/ohn.489
PMID:37595113
Abstract

OBJECTIVE

To investigate the accuracy of Chat-Based Generative Pre-trained Transformer (ChatGPT) in answering questions and solving clinical scenarios of head and neck surgery.

STUDY DESIGN

Observational and valuative study.

SETTING

Eighteen surgeons from 14 Italian head and neck surgery units.

METHODS

A total of 144 clinical questions encompassing different subspecialities of head and neck surgery and 15 comprehensive clinical scenarios were developed. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using accuracy (range 1-6), completeness (range 1-3), and references' quality Likert scales.

RESULTS

The overall median score of open-ended questions was 6 (interquartile range[IQR]: 5-6) for accuracy and 3 (IQR: 2-3) for completeness. Overall, the reviewers rated the answer as entirely or nearly entirely correct in 87.2% of cases and as comprehensive and covering all aspects of the question in 73% of cases. The artificial intelligence (AI) model achieved a correct response in 84.7% of the closed-ended questions (11 wrong answers). As for the clinical scenarios, ChatGPT provided a fully or nearly fully correct diagnosis in 81.7% of cases. The proposed diagnostic or therapeutic procedure was judged to be complete in 56.7% of cases. The overall quality of the bibliographic references was poor, and sources were nonexistent in 46.4% of the cases.

CONCLUSION

The results generally demonstrate a good level of accuracy in the AI's answers. The AI's ability to resolve complex clinical scenarios is promising, but it still falls short of being considered a reliable support for the decision-making process of specialists in head-neck surgery.

摘要

目的

探讨基于聊天的生成式预训练变换器(ChatGPT)在回答问题及解决头颈外科临床场景方面的准确性。

研究设计

观察性和评估性研究。

研究地点

来自14个意大利头颈外科单位的18名外科医生。

方法

共设计了144个涵盖头颈外科不同亚专业的临床问题以及15个综合临床场景。将问题和场景输入ChatGPT4,研究人员使用准确性(范围1 - 6)、完整性(范围1 - 3)和参考文献质量李克特量表对生成的答案进行评估。

结果

开放式问题的总体中位数得分在准确性方面为6(四分位间距[IQR]:5 - 6),在完整性方面为3(IQR:2 - 3)。总体而言,评审人员在87.2%的案例中认为答案完全或几乎完全正确,在73%的案例中认为答案全面且涵盖了问题的所有方面。人工智能(AI)模型在84.7%的封闭式问题中给出了正确答案(11个错误答案)。至于临床场景,ChatGPT在81.7%的案例中提供了完全或几乎完全正确的诊断。在56.7%的案例中,所提出的诊断或治疗程序被认为是完整的。参考文献的总体质量较差,46.4%的案例中没有参考文献来源。

结论

结果总体表明人工智能答案的准确性处于良好水平。人工智能解决复杂临床场景的能力很有前景,但仍不足以被视为对头颈外科专家决策过程的可靠支持。

相似文献

1
Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis.ChatGPT生成的关于头颈及口腔颌面外科信息的准确性:一项多中心协作分析
Otolaryngol Head Neck Surg. 2024 Jun;170(6):1492-1503. doi: 10.1002/ohn.489. Epub 2023 Aug 18.
2
Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study.ChatGPT生成的关于阻断性正畸信息的准确性和完整性:一项多中心合作研究
J Clin Med. 2024 Jan 27;13(3):735. doi: 10.3390/jcm13030735.
3
Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?ChatGPT在回答有关头颈癌的问题时准确可靠吗?
Front Oncol. 2023 Dec 1;13:1256459. doi: 10.3389/fonc.2023.1256459. eCollection 2023.
4
The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.人工智能模型在生成正畸常见问题回答方面的表现:ChatGPT与谷歌巴德的对比
Am J Orthod Dentofacial Orthop. 2024 Jun;165(6):652-662. doi: 10.1016/j.ajodo.2024.01.012. Epub 2024 Mar 15.
5
Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。
JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.
6
Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model.评估牙科领域人工智能生成回复的准确性、完整性和可靠性:一项评估ChatGPT模型的试点研究
Cureus. 2024 Jul 29;16(7):e65658. doi: 10.7759/cureus.65658. eCollection 2024 Jul.
7
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性:ChatGPT与谷歌巴德人工智能的比较分析
Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.
8
Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions.人工智能在减重手术中的表现:ChatGPT-4、Bing 和 Bard 在《美国代谢与减重外科学会减重手术教科书》减重手术问题中的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):609-613. doi: 10.1016/j.soard.2024.04.014. Epub 2024 May 8.
9
Exploring the landscape of AI-assisted decision-making in head and neck cancer treatment: a comparative analysis of NCCN guidelines and ChatGPT responses.探索人工智能辅助头颈部癌症治疗决策的全景:NCCN 指南与 ChatGPT 回复的比较分析。
Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2123-2136. doi: 10.1007/s00405-024-08525-z. Epub 2024 Feb 29.
10
Utilizing Artificial Intelligence and Chat Generative Pretrained Transformer to Answer Questions About Clinical Scenarios in Neuroanesthesiology.利用人工智能和聊天生成预训练变换器回答神经麻醉学临床场景相关问题。
J Neurosurg Anesthesiol. 2023 Dec 19. doi: 10.1097/ANA.0000000000000949.

引用本文的文献

1
Diagnostic Performance of ChatGPT-4o in Analyzing Oral Mucosal Lesions: A Comparative Study with Experts.ChatGPT-4o在分析口腔黏膜病变中的诊断性能:与专家的比较研究
Medicina (Kaunas). 2025 Jul 30;61(8):1379. doi: 10.3390/medicina61081379.
2
Evaluation of the accuracy of ChatGPT-4 and Gemini's responses to the World Dental Federation's frequently asked questions on oral health.评估ChatGPT-4和Gemini对世界牙科联盟关于口腔健康常见问题的回答的准确性。
BMC Oral Health. 2025 Aug 2;25(1):1293. doi: 10.1186/s12903-025-06624-9.
3
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.
大语言模型在回答牙周根分叉病变管理临床问题中的性能评估
Dent J (Basel). 2025 Jun 18;13(6):271. doi: 10.3390/dj13060271.
4
ChatGPT versus DeepSeek in head and neck cancer staging and treatment planning: guideline-based study.ChatGPT与DeepSeek在头颈癌分期及治疗规划中的比较:基于指南的研究
Eur Arch Otorhinolaryngol. 2025 Jun 17. doi: 10.1007/s00405-025-09524-4.
5
Dall-E in hand surgery: Exploring the utility of ChatGPT image generation.Dall-E在手部外科手术中的应用:探索ChatGPT图像生成的效用。
Surg Open Sci. 2025 May 10;26:64-78. doi: 10.1016/j.sopen.2025.04.012. eCollection 2025 Jun.
6
Comparative analysis of AI chatbot (ChatGPT-4.0 and Microsoft Copilot) and expert responses to common orthodontic questions: patient and orthodontist evaluations.人工智能聊天机器人(ChatGPT-4.0和Microsoft Copilot)与正畸专家对常见正畸问题回答的比较分析:患者和正畸医生的评估
BMC Oral Health. 2025 Jun 3;25(1):896. doi: 10.1186/s12903-025-06194-w.
7
Applications of Natural Language Processing in Otolaryngology: A Scoping Review.自然语言处理在耳鼻咽喉科的应用:一项范围综述
Laryngoscope. 2025 Sep;135(9):3049-3063. doi: 10.1002/lary.32198. Epub 2025 May 1.
8
AI-Driven Information for Relatives of Patients with Malignant Middle Cerebral Artery Infarction: A Preliminary Validation Study Using GPT-4o.人工智能驱动的大脑中动脉恶性梗死患者亲属信息:使用GPT-4o的初步验证研究
Brain Sci. 2025 Apr 11;15(4):391. doi: 10.3390/brainsci15040391.
9
Evaluation of the performance of large language models in clinical decision-making in endodontics.大型语言模型在牙髓病学临床决策中的性能评估。
BMC Oral Health. 2025 Apr 28;25(1):648. doi: 10.1186/s12903-025-06050-x.
10
A Validity Analysis of Text-to-Image Generative Artificial Intelligence Models for Craniofacial Anatomy Illustration.用于颅面解剖学插图的文本到图像生成式人工智能模型的有效性分析
J Clin Med. 2025 Mar 21;14(7):2136. doi: 10.3390/jcm14072136.