Tadrousse Kirollos, Cash Catherine A, Kastury Madhulika R, Thompson Noelle, Simman Richard
From the College of Medicine and Life Sciences, University of Toledo, Toledo, OH.
Department of Surgery, College of Medicine and Life Sciences, University of Toledo, Toledo, OH.
Plast Reconstr Surg Glob Open. 2025 Jun 12;13(6):e6871. doi: 10.1097/GOX.0000000000006871. eCollection 2025 Jun.
Chronic wounds affect approximately 2.5% of the US population and can cause severe complications if not identified and treated promptly. Artificial intelligence tools such as Microsoft's Copilot have the potential to expedite diagnosis, but their clinical diagnostic accuracy remains underexplored.
Ten chronic wound cases were selected from the publicly available database of the Silesian University of Technology. Images and demographic data were entered into Copilot, which generated the top 3 differential diagnoses for each case. Diagnostic accuracy was evaluated using a predefined scoring system. Statistical analysis included descriptive statistics, the Wilcoxon signed-rank test, bootstrapping, the Fisher-Pitman permutation test, Cohen kappa, and Fisher exact test.
Copilot correctly identified the primary diagnosis in 30% of cases and included the correct diagnosis within its top 3 differentials in 70% of cases. The mean diagnostic score was 1.7 (median: 2, SD: 1.25, variance: 1.57). The Wilcoxon test indicated no significant deviation from the median reference value ( = 0.6364), whereas bootstrapping yielded a 95% confidence interval of 1-4. The permutation test demonstrated a significant difference from the null hypothesis ( = 0.017), and the Cohen kappa revealed perfect agreement (kappa = 1, = 0.00157). The Fisher exact test showed no significant association between primary and top 3 diagnostic accuracy ( = 0.20).
Microsoft Copilot demonstrated limited diagnostic accuracy in chronic wound assessment, underscoring the need for cautious integration into clinical workflows. Broader datasets and more rigorous validation are crucial for enhancing artificial intelligence-supported diagnostics in wound care.
慢性伤口影响着约2.5%的美国人口,如果不及时识别和治疗,可能会引发严重并发症。诸如微软的Copilot等人工智能工具具有加快诊断速度的潜力,但其临床诊断准确性仍未得到充分探索。
从西里西亚工业大学的公开数据库中选取了10例慢性伤口病例。将图像和人口统计学数据输入Copilot,该工具会为每个病例生成前3种鉴别诊断。使用预定义的评分系统评估诊断准确性。统计分析包括描述性统计、威尔科克森符号秩检验、自助法、费希尔-皮特曼排列检验、科恩kappa系数和费希尔精确检验。
Copilot在30%的病例中正确识别出了主要诊断,在70%的病例中其前3种鉴别诊断中包含了正确诊断。平均诊断得分为1.7(中位数:2,标准差:1.25,方差:1.57)。威尔科克森检验表明与中位数参考值无显著偏差(P = 0.6364),而自助法得出的95%置信区间为1 - 4。排列检验显示与零假设存在显著差异(P = 0.017),科恩kappa系数显示完全一致(kappa = 1,P = 0.00157)。费希尔精确检验表明主要诊断与前3种诊断准确性之间无显著关联(P = 0.20)。
微软Copilot在慢性伤口评估中的诊断准确性有限,这凸显了在临床工作流程中谨慎整合的必要性。更广泛的数据集和更严格的验证对于加强伤口护理中人工智能支持的诊断至关重要。