Suppr超能文献

强大、可靠且可解释的基于人工智能的前列腺癌组织病理学分级。

Robust, credible, and interpretable AI-based histopathological prostate cancer grading.

作者信息

Westhaeusser Fabian, Fuhlert Patrick, Dietrich Esther, Lennartz Maximilian, Khatri Robin, Kaiser Nico, Röbeck Pontus, Bülow Roman, von Stillfried Saskia, Witte Anja, Ladjevardi Sam, Drotte Anders, Severgardh Peter, Baumbach Jan, Puelles Victor G, Häggman Michael, Brehler Michael, Boor Peter, Walhagen Peter, Dragomir Anca, Busch Christer, Graefen Markus, Bengtsson Ewert, Sauter Guido, Zimmermann Marina, Bonn Stefan

机构信息

Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany.

Spearpoint Analytics AB, Stockholm, Sweden.

出版信息

medRxiv. 2024 Jul 10:2024.07.09.24310082. doi: 10.1101/2024.07.09.24310082.

Abstract

BACKGROUND

Prostate cancer (PCa) is among the most common cancers in men and its diagnosis requires the histopathological evaluation of biopsies by human experts. While several recent artificial intelligence-based (AI) approaches have reached human expert-level PCa grading, they often display significantly reduced performance on external datasets. This reduced performance can be caused by variations in sample preparation, for instance the staining protocol, section thickness, or scanner used. Another limiting factor of contemporary AI-based PCa grading is the prediction of ISUP grades, which leads to the perpetuation of human annotation errors.

METHODS

We developed the prostate cancer aggressiveness index (PCAI), an AI-based PCa detection and grading framework that is trained on objective patient outcome, rather than subjective ISUP grades. We designed PCAI as a clinical application, containing algorithmic modules that offer robustness to data variation, medical interpretability, and a measure of prediction confidence. To train and evaluate PCAI, we generated a multicentric, retrospective, observational trial consisting of six cohorts with 25,591 patients, 83,864 images, and 5 years of median follow-up from 5 different centers and 3 countries. This includes a high-variance dataset of 8,157 patients and 28,236 images with variations in sample thickness, staining protocol, and scanner, allowing for the systematic evaluation and optimization of model robustness to data variation. The performance of PCAI was assessed on three external test cohorts from two countries, comprising 2,255 patients and 9,437 images.

FINDINGS

Using our high-variance datasets, we show how differences in sample processing, particularly slide thickness and staining time, significantly reduce the performance of AI-based PCa grading by up to 6.2 percentage points in the concordance index (C-index). We show how a select set of algorithmic improvements, including domain adversarial training, conferred robustness to data variation, interpretability, and a measure of credibility to PCAI. These changes lead to significant prediction improvement across two biopsy cohorts and one TMA cohort, systematically exceeding expert ISUP grading in C-index and AUROC by up to 22 percentage points.

INTERPRETATION

Data variation poses serious risks for AI-based histopathological PCa grading, even when models are trained on large datasets. Algorithmic improvements for model robustness, interpretability, credibility, and training on high-variance data as well as outcome-based severity prediction gives rise to robust models with above ISUP-level PCa grading performance.

摘要

背景

前列腺癌(PCa)是男性中最常见的癌症之一,其诊断需要人类专家对活检组织进行组织病理学评估。虽然最近几种基于人工智能(AI)的方法已达到人类专家水平的PCa分级,但它们在外部数据集上的表现往往显著下降。这种性能下降可能是由样本制备的差异引起的,例如染色方案、切片厚度或使用的扫描仪。当代基于AI的PCa分级的另一个限制因素是ISUP分级的预测,这导致人类注释错误的持续存在。

方法

我们开发了前列腺癌侵袭性指数(PCAI),这是一个基于AI的PCa检测和分级框架,它基于客观的患者预后进行训练,而不是主观的ISUP分级。我们将PCAI设计为一种临床应用,包含算法模块,这些模块对数据变化具有鲁棒性、医学可解释性,并能衡量预测置信度。为了训练和评估PCAI,我们生成了一项多中心、回顾性、观察性试验,该试验由六个队列组成,共有25591名患者、83864张图像,来自5个不同中心和3个国家,中位随访时间为5年。这包括一个高变异性数据集,其中有8157名患者和28236张图像,样本厚度、染色方案和扫描仪存在差异,从而能够系统地评估和优化模型对数据变化的鲁棒性。PCAI的性能在来自两个国家的三个外部测试队列上进行了评估,这些队列包括2255名患者和9437张图像。

结果

使用我们的高变异性数据集,我们展示了样本处理中的差异,特别是载玻片厚度和染色时间,如何在一致性指数(C-index)中使基于AI的PCa分级性能显著降低多达6.2个百分点。我们展示了一组选定的算法改进,包括域对抗训练,如何赋予PCAI对数据变化的鲁棒性、可解释性和可信度衡量。这些变化导致在两个活检队列和一个组织微阵列(TMA)队列中预测有显著改善,在C-index和受试者工作特征曲线下面积(AUROC)方面系统地超过专家ISUP分级多达22个百分点。

解读

数据变化对基于AI的组织病理学PCa分级构成严重风险,即使模型是在大型数据集上训练的。对模型鲁棒性、可解释性、可信度以及在高变异性数据上进行训练并基于结果进行严重程度预测的算法改进,产生了具有高于ISUP水平的PCa分级性能的稳健模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dfa/11261944/88f0b3e4cd97/nihpp-2024.07.09.24310082v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验