微软Copilot人工智能在慢性伤口评估中的诊断准确性：一项比较研究。

Diagnostic Accuracy of Microsoft's Copilot Artificial Intelligence in Chronic Wound Assessment: A Comparative Study.

作者信息

Tadrousse Kirollos, Cash Catherine A, Kastury Madhulika R, Thompson Noelle, Simman Richard

机构信息

From the College of Medicine and Life Sciences, University of Toledo, Toledo, OH.

Department of Surgery, College of Medicine and Life Sciences, University of Toledo, Toledo, OH.

出版信息

Plast Reconstr Surg Glob Open. 2025 Jun 12;13(6):e6871. doi: 10.1097/GOX.0000000000006871. eCollection 2025 Jun.

DOI:10.1097/GOX.0000000000006871

PMID:40510430

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12160731/

Abstract

BACKGROUND

Chronic wounds affect approximately 2.5% of the US population and can cause severe complications if not identified and treated promptly. Artificial intelligence tools such as Microsoft's Copilot have the potential to expedite diagnosis, but their clinical diagnostic accuracy remains underexplored.

METHODS

Ten chronic wound cases were selected from the publicly available database of the Silesian University of Technology. Images and demographic data were entered into Copilot, which generated the top 3 differential diagnoses for each case. Diagnostic accuracy was evaluated using a predefined scoring system. Statistical analysis included descriptive statistics, the Wilcoxon signed-rank test, bootstrapping, the Fisher-Pitman permutation test, Cohen kappa, and Fisher exact test.

RESULTS

Copilot correctly identified the primary diagnosis in 30% of cases and included the correct diagnosis within its top 3 differentials in 70% of cases. The mean diagnostic score was 1.7 (median: 2, SD: 1.25, variance: 1.57). The Wilcoxon test indicated no significant deviation from the median reference value ( = 0.6364), whereas bootstrapping yielded a 95% confidence interval of 1-4. The permutation test demonstrated a significant difference from the null hypothesis ( = 0.017), and the Cohen kappa revealed perfect agreement (kappa = 1, = 0.00157). The Fisher exact test showed no significant association between primary and top 3 diagnostic accuracy ( = 0.20).

CONCLUSIONS

Microsoft Copilot demonstrated limited diagnostic accuracy in chronic wound assessment, underscoring the need for cautious integration into clinical workflows. Broader datasets and more rigorous validation are crucial for enhancing artificial intelligence-supported diagnostics in wound care.

摘要

背景

慢性伤口影响着约2.5%的美国人口，如果不及时识别和治疗，可能会引发严重并发症。诸如微软的Copilot等人工智能工具具有加快诊断速度的潜力，但其临床诊断准确性仍未得到充分探索。

方法

从西里西亚工业大学的公开数据库中选取了10例慢性伤口病例。将图像和人口统计学数据输入Copilot，该工具会为每个病例生成前3种鉴别诊断。使用预定义的评分系统评估诊断准确性。统计分析包括描述性统计、威尔科克森符号秩检验、自助法、费希尔-皮特曼排列检验、科恩kappa系数和费希尔精确检验。

结果

Copilot在30%的病例中正确识别出了主要诊断，在70%的病例中其前3种鉴别诊断中包含了正确诊断。平均诊断得分为1.7（中位数：2，标准差：1.25，方差：1.57）。威尔科克森检验表明与中位数参考值无显著偏差（P = 0.6364），而自助法得出的95%置信区间为1 - 4。排列检验显示与零假设存在显著差异（P = 0.017），科恩kappa系数显示完全一致（kappa = 1，P = 0.00157）。费希尔精确检验表明主要诊断与前3种诊断准确性之间无显著关联（P = 0.20）。

结论

微软Copilot在慢性伤口评估中的诊断准确性有限，这凸显了在临床工作流程中谨慎整合的必要性。更广泛的数据集和更严格的验证对于加强伤口护理中人工智能支持的诊断至关重要。

相似文献

Diagnostic Accuracy of Microsoft's Copilot Artificial Intelligence in Chronic Wound Assessment: A Comparative Study.微软Copilot人工智能在慢性伤口评估中的诊断准确性：一项比较研究。

Plast Reconstr Surg Glob Open. 2025 Jun 12;13(6):e6871. doi: 10.1097/GOX.0000000000006871. eCollection 2025 Jun.

Assessment and comparison of artificial intelligence-generated information regarding shoulder arthroplasty from multiple interfaces.多界面人工智能生成的关于肩关节置换术信息的评估与比较

J Shoulder Elbow Surg. 2025 Feb 17. doi: 10.1016/j.jse.2024.12.048.

Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence.评估六种大语言模型在儿童牙科领域基于证据的潜力：生成式人工智能的比较研究

Eur Arch Paediatr Dent. 2025 Jun;26(3):527-535. doi: 10.1007/s40368-025-01012-x. Epub 2025 Feb 22.

Performance of the Large Language Models in African rheumatology: a diagnostic test accuracy study of ChatGPT-4, Gemini, Copilot, and Claude artificial intelligence.大语言模型在非洲风湿病学中的表现：ChatGPT-4、Gemini、Copilot和Claude人工智能的诊断测试准确性研究

BMC Rheumatol. 2025 May 16;9(1):54. doi: 10.1186/s41927-025-00512-z.

Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study.ChatGPT 3.5 Copilot 和 Gemini 解读生化实验室数据的反应准确性：一项初步研究。

Sci Rep. 2024 Apr 8;14(1):8233. doi: 10.1038/s41598-024-58964-1.

Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比：横断面试点研究

JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.

Evaluation of Chat Generative Pre-trained Transformer and Microsoft Copilot Performance on the American Society of Surgery of the Hand Self-Assessment Examinations.关于Chat生成式预训练变换器和微软副驾驶在美国手外科协会自我评估考试中的性能评估。

J Hand Surg Glob Online. 2024 Nov 13;7(1):23-28. doi: 10.1016/j.jhsg.2024.10.001. eCollection 2025 Jan.

Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports.ChatGPT和必应中的微软Copilot在回答产科超声问题及分析产科超声报告方面的表现。

Sci Rep. 2025 Apr 26;15(1):14627. doi: 10.1038/s41598-025-99268-2.

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.评估基于大语言模型的生成式人工智能工具在急诊分诊中的应用：ChatGPT Plus、Copilot Pro与分诊护士的对比研究

Am J Emerg Med. 2025 Mar;89:174-181. doi: 10.1016/j.ajem.2024.12.024. Epub 2024 Dec 19.

Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性：一项横断面研究。

BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.

本文引用的文献

Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: a systematic review.医疗保健专业人员使用与人工智能相关的算法决策系统的益处和危害：一项系统综述。

Lancet Reg Health Eur. 2024 Dec 1;48:101145. doi: 10.1016/j.lanepe.2024.101145. eCollection 2025 Jan.

Revolutionizing Radiology with Natural Language Processing and Chatbot Technologies: A Narrative Umbrella Review on Current Trends and Future Directions.利用自然语言处理和聊天机器人技术革新放射学：当前趋势与未来方向的叙述性综述

J Clin Med. 2024 Dec 2;13(23):7337. doi: 10.3390/jcm13237337.

From admission to discharge: a systematic review of clinical natural language processing along the patient journey.从入院到出院：患者就诊流程中临床自然语言处理的系统评价。

BMC Med Inform Decis Mak. 2024 Aug 29;24(1):238. doi: 10.1186/s12911-024-02641-w.

Health Equity and Ethical Considerations in Using Artificial Intelligence in Public Health and Medicine.人工智能在公共卫生和医学中的应用：健康公平和伦理问题。

Prev Chronic Dis. 2024 Aug 22;21:E64. doi: 10.5888/pcd21.240245.

Performance of artificial intelligence chatbots in interpreting clinical images of pressure injuries.人工智能聊天机器人在解读压力性损伤临床图像方面的表现。

Wound Repair Regen. 2024 Sep-Oct;32(5):652-654. doi: 10.1111/wrr.13189. Epub 2024 May 15.

Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms.验证医学人工智能质量分析（QAMAI）工具：一种评估人工智能平台提供的健康信息质量的新工具。

Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6123-6131. doi: 10.1007/s00405-024-08710-0. Epub 2024 May 4.

Application of deep learning to pressure injury staging.深度学习在压力性损伤分期中的应用。

J Wound Care. 2024 May 2;33(5):368-378. doi: 10.12968/jowc.2024.33.5.368.

Artificial Intelligence (AI) in Radiology: A Deep Dive Into ChatGPT 4.0's Accuracy with the American Journal of Neuroradiology's (AJNR) "Case of the Month".放射学中的人工智能（AI）：深入探讨ChatGPT 4.0与《美国神经放射学杂志》（AJNR）“月度病例”的准确性。

Cureus. 2023 Aug 23;15(8):e43958. doi: 10.7759/cureus.43958. eCollection 2023 Aug.

Are artificial intelligence large language models a reliable tool for difficult differential diagnosis? An a posteriori analysis of a peculiar case of necrotizing otitis externa.人工智能大语言模型是进行疑难鉴别诊断的可靠工具吗？一例坏死性外耳道炎特殊病例的事后分析。

Clin Case Rep. 2023 Sep 19;11(9):e7933. doi: 10.1002/ccr3.7933. eCollection 2023 Sep.

ChatGPT-4 and the Global Burden of Disease Study: Advancing Personalized Healthcare Through Artificial Intelligence in Clinical and Translational Medicine.ChatGPT-4与全球疾病负担研究：通过临床与转化医学中的人工智能推动个性化医疗

Cureus. 2023 May 23;15(5):e39384. doi: 10.7759/cureus.39384. eCollection 2023 May.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验