Tissue Image Analytics Centre, Department of Computer Science, University of Warwick, Coventry, UK; Department of Artificial Intelligence and Data Science, National University of Computer and Emerging Sciences, Islamabad, Pakistan.
Department of Pathology, University Hospitals Coventry and Warwickshire National Health Service Trust, Coventry, UK.
Lancet Digit Health. 2023 Nov;5(11):e786-e797. doi: 10.1016/S2589-7500(23)00148-6.
BACKGROUND: Histopathological examination is a crucial step in the diagnosis and treatment of many major diseases. Aiming to facilitate diagnostic decision making and improve the workload of pathologists, we developed an artificial intelligence (AI)-based prescreening tool that analyses whole-slide images (WSIs) of large-bowel biopsies to identify typical, non-neoplastic, and neoplastic biopsies. METHODS: This retrospective cohort study was conducted with an internal development cohort of slides acquired from a hospital in the UK and three external validation cohorts of WSIs acquired from two hospitals in the UK and one clinical laboratory in Portugal. To learn the differential histological patterns from digitised WSIs of large-bowel biopsy slides, our proposed weakly supervised deep-learning model (Colorectal AI Model for Abnormality Detection [CAIMAN]) used slide-level diagnostic labels and no detailed cell or region-level annotations. The method was developed with an internal development cohort of 5054 biopsy slides from 2080 patients that were labelled with corresponding diagnostic categories assigned by pathologists. The three external validation cohorts, with a total of 1536 slides, were used for independent validation of CAIMAN. Each WSI was classified into one of three classes (ie, typical, atypical non-neoplastic, and atypical neoplastic). Prediction scores of image tiles were aggregated into three prediction scores for the whole slide, one for its likelihood of being typical, one for its likelihood of being non-neoplastic, and one for its likelihood of being neoplastic. The assessment of the external validation cohorts was conducted by the trained and frozen CAIMAN model. To evaluate model performance, we calculated area under the convex hull of the receiver operating characteristic curve (AUROC), area under the precision-recall curve, and specificity compared with our previously published iterative draw and rank sampling (IDaRS) algorithm. We also generated heat maps and saliency maps to analyse and visualise the relationship between the WSI diagnostic labels and spatial features of the tissue microenvironment. The main outcome of this study was the ability of CAIMAN to accurately identify typical and atypical WSIs of colon biopsies, which could potentially facilitate automatic removing of typical biopsies from the diagnostic workload in clinics. FINDINGS: A randomly selected subset of all large bowel biopsies was obtained between Jan 1, 2012, and Dec 31, 2017. The AI training, validation, and assessments were done between Jan 1, 2021, and Sept 30, 2022. WSIs with diagnostic labels were collected between Jan 1 and Sept 30, 2022. Our analysis showed no statistically significant differences across prediction scores from CAIMAN for typical and atypical classes based on anatomical sites of the biopsy. At 0·99 sensitivity, CAIMAN (specificity 0·5592) was more accurate than an IDaRS-based weakly supervised WSI-classification pipeline (0·4629) in identifying typical and atypical biopsies on cross-validation in the internal development cohort (p<0·0001). At 0·99 sensitivity, CAIMAN was also more accurate than IDaRS for two external validation cohorts (p<0·0001), but not for a third external validation cohort (p=0·10). CAIMAN provided higher specificity than IDaRS at some high-sensitivity thresholds (0·7763 vs 0·6222 for 0·95 sensitivity, 0·7126 vs 0·5407 for 0·97 sensitivity, and 0·5615 vs 0·3970 for 0·99 sensitivity on one of the external validation cohorts) and showed high classification performance in distinguishing between neoplastic biopsies (AUROC 0·9928, 95% CI 0·9927-0·9929), inflammatory biopsies (0·9658, 0·9655-0·9661), and atypical biopsies (0·9789, 0·9786-0·9792). On the three external validation cohorts, CAIMAN had AUROC values of 0·9431 (95% CI 0·9165-0·9697), 0·9576 (0·9568-0·9584), and 0·9636 (0·9615-0·9657) for the detection of atypical biopsies. Saliency maps supported the representation of disease heterogeneity in model predictions and its association with relevant histological features. INTERPRETATION: CAIMAN, with its high sensitivity in detecting atypical large-bowel biopsies, might be a promising improvement in clinical workflow efficiency and diagnostic decision making in prescreening of typical colorectal biopsies. FUNDING: The Pathology Image Data Lake for Analytics, Knowledge and Education Centre of Excellence; the UK Government's Industrial Strategy Challenge Fund; and Innovate UK on behalf of UK Research and Innovation.
背景:组织病理学检查是许多重大疾病诊断和治疗的关键步骤。为了帮助病理学家做出诊断决策并减轻其工作负担,我们开发了一种基于人工智能的预筛选工具,该工具可分析大肠活检的全切片图像(WSI),以识别典型、非肿瘤性和肿瘤性活检。
方法:本回顾性队列研究使用来自英国一家医院的幻灯片内部开发队列和来自英国两家医院和葡萄牙一家临床实验室的三个外部验证队列的 WSI 进行。为了从大肠活检幻灯片的数字化 WSI 中学习差异的组织学模式,我们提出的弱监督深度学习模型(用于异常检测的结直肠人工智能模型[CAIMAN])使用了幻灯片级别的诊断标签,而没有详细的细胞或区域级注释。该方法是在一个包含 2080 名患者的 5054 张活检幻灯片的内部开发队列上开发的,这些幻灯片被病理学家分配了相应的诊断类别。共有 1536 张 WSI 的三个外部验证队列用于 CAIMAN 的独立验证。每张 WSI 被分类为以下三个类别之一:典型、非典型非肿瘤性和非典型肿瘤性。图像瓦片的预测得分被汇总为三个预测得分,一个用于其典型性的可能性,一个用于其非肿瘤性的可能性,一个用于其肿瘤性的可能性。外部验证队列的评估是由经过训练和冻结的 CAIMAN 模型进行的。为了评估模型性能,我们计算了接收器工作特征曲线的凸包面积(AUROC)、精度-召回曲线下面积和特异性,与我们之前发表的迭代绘图和排名抽样(IDaRS)算法进行了比较。我们还生成了热图和显著图,以分析和可视化 WSI 诊断标签与组织微环境空间特征之间的关系。本研究的主要结果是 CAIMAN 能够准确识别结直肠活检的典型和非典型 WSI,这可能有助于在临床实践中自动从诊断工作中去除典型活检。
发现:2012 年 1 月 1 日至 2017 年 12 月 31 日期间,随机选择了所有大肠活检的一部分进行分析。人工智能的培训、验证和评估于 2021 年 1 月 1 日至 2022 年 9 月 30 日进行。2022 年 1 月 1 日至 9 月 30 日期间收集了带有诊断标签的 WSI。我们的分析显示,CAIMAN 对基于活检解剖部位的典型和非典型类别的预测得分没有统计学上的显著差异。在内部开发队列的交叉验证中,CAIMAN(特异性 0.5592)在 0.99 灵敏度时的准确性高于基于 IDaRS 的弱监督 WSI 分类管道(特异性 0.4629)(p<0.0001)。在 0.99 灵敏度时,CAIMAN 也比 IDaRS 更准确,用于两个外部验证队列(p<0.0001),但不是用于第三个外部验证队列(p=0.10)。在一些高灵敏度阈值下,CAIMAN 的特异性高于 IDaRS(0.95 灵敏度时为 0.7763 对 0.6222,0.97 灵敏度时为 0.7126 对 0.5407,0.99 灵敏度时为 0.5615 对 0.3970,其中一个外部验证队列),并且在区分肿瘤性活检(AUROC 0.9928,95%CI 0.9927-0.9929)、炎症性活检(0.9658,0.9655-0.9661)和非典型活检(0.9789,0.9786-0.9792)方面表现出较高的分类性能。在三个外部验证队列中,CAIMAN 对非典型活检的检测的 AUROC 值分别为 0.9431(95%CI 0.9165-0.9697)、0.9576(95%CI 0.9568-0.9584)和 0.9636(95%CI 0.9615-0.9657)。显著图支持在模型预测中表示疾病异质性及其与相关组织学特征的关联。
解释:CAIMAN 在检测非典型大肠活检方面具有较高的灵敏度,可能是提高临床工作效率和诊断决策的有前途的方法,特别是在典型结直肠活检的预筛选中。
资助:病理学图像数据湖分析、知识和教育卓越中心;英国政府的产业战略挑战基金;以及英国创新署代表英国研究与创新。
Comput Struct Biotechnol J. 2024-12-30