Evans Harriet, Sivakumar Naveen, Bhanderi Shivam, Graham Simon, Snead David, Patel Abhilasha, Robinson Andrew
University of Warwick, Coventry, UK
Histopathology, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK.
BMJ Open Gastroenterol. 2025 Jan 6;12(1):e001649. doi: 10.1136/bmjgast-2024-001649.
Artificial intelligence (AI) tools for histological diagnosis offer great potential to healthcare, yet failure to understand their clinical context is delaying adoption. IGUANA (Interpretable Gland-Graphs using a Neural Aggregator) is an AI algorithm that can effectively classify colonic biopsies into normal versus abnormal categories, designed to automatically report normal cases. We performed a retrospective pathological and clinical review of the errors made by IGUANA.
False negative (FN) errors were the primary focus due to the greatest propensity for harm. Pathological evaluation involved assessment of whole slide image (WSI) quality, precise diagnoses for each missed entity and identification of factors impeding diagnosis. Clinical evaluation scored the impact of each error on the patient and detailed the type of impact in terms of missed diagnosis, investigations or treatment.
Across 5054 WSIs from 2080 UK National Health Service patients there were 220 FN errors across 164 cases (4.4% of WSI, 7.9% of cases). Diagnostic errors varied from cases of adenocarcinoma to mild inflammation. 88.4% of FN errors would have no impact on patient care, with only one error causing major patient harm. Factors that protected against harm included biopsies being low-risk polyps or diagnostic features were detected in other biopsies.
Most FN errors would not result in patient harm, suggesting that even with a 7.9% case-level error rate, this AI tool might be more suitable for adoption than statistics portray. Consideration of the clinical context of AI tool errors is essential to facilitate safe implementation.
用于组织学诊断的人工智能(AI)工具为医疗保健带来了巨大潜力,但由于未能理解其临床背景,其应用受到了延迟。IGUANA(使用神经聚合器的可解释腺体图)是一种AI算法,能够有效地将结肠活检分类为正常与异常类别,旨在自动报告正常病例。我们对IGUANA所犯错误进行了回顾性病理和临床审查。
由于假阴性(FN)错误造成伤害的可能性最大,因此将其作为主要关注点。病理评估包括对全切片图像(WSI)质量的评估、对每个漏诊实体的精确诊断以及对阻碍诊断因素的识别。临床评估对每个错误对患者的影响进行评分,并详细说明在漏诊、检查或治疗方面的影响类型。
在来自2080名英国国民健康服务患者的5054张WSI中,164例出现了220例FN错误(占WSI的4.4%,占病例的7.9%)。诊断错误从腺癌病例到轻度炎症不等。88.4%的FN错误对患者护理没有影响,只有一例错误对患者造成了重大伤害。防止伤害的因素包括活检为低风险息肉或在其他活检中检测到诊断特征。
大多数FN错误不会对患者造成伤害,这表明即使病例级错误率为7.9%,该AI工具可能比统计数据显示的更适合采用。考虑AI工具错误的临床背景对于促进安全实施至关重要。