Department of Pathology and Laboratory Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison.
PathomIQ.
JAMA Netw Open. 2021 Nov 1;4(11):e2132554. doi: 10.1001/jamanetworkopen.2021.32554.
The Gleason grading system has been the most reliable tool for the prognosis of prostate cancer since its development. However, its clinical application remains limited by interobserver variability in grading and quantification, which has negative consequences for risk assessment and clinical management of prostate cancer.
To examine the impact of an artificial intelligence (AI)-assisted approach to prostate cancer grading and quantification.
DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study was conducted at the University of Wisconsin-Madison from August 2, 2017, to December 30, 2019. The study chronologically selected 589 men with biopsy-confirmed prostate cancer who received care in the University of Wisconsin Health System between January 1, 2005, and February 28, 2017. A total of 1000 biopsy slides (1 or 2 slides per patient) were selected and scanned to create digital whole-slide images, which were used to develop and validate a deep convolutional neural network-based AI-powered platform. The whole-slide images were divided into a training set (n = 838) and validation set (n = 162). Three experienced academic urological pathologists (W.H., K.A.I., and R.H., hereinafter referred to as pathologists 1, 2, and 3, respectively) were involved in the validation. Data were collected between December 29, 2018, and December 20, 2019, and analyzed from January 4, 2020, to March 1, 2021.
Accuracy of prostate cancer detection by the AI-powered platform and comparison of prostate cancer grading and quantification performed by the 3 pathologists using manual vs AI-assisted methods.
Among 589 men with biopsy slides, the mean (SD) age was 63.8 (8.2) years, the mean (SD) prebiopsy prostate-specific antigen level was 10.2 (16.2) ng/mL, and the mean (SD) total cancer volume was 15.4% (20.1%). The AI system was able to distinguish prostate cancer from benign prostatic epithelium and stroma with high accuracy at the patch-pixel level, with an area under the receiver operating characteristic curve of 0.92 (95% CI, 0.88-0.95). The AI system achieved almost perfect agreement with the training pathologist (pathologist 1) in detecting prostate cancer at the patch-pixel level (weighted κ = 0.97; asymptotic 95% CI, 0.96-0.98) and in grading prostate cancer at the slide level (weighted κ = 0.98; asymptotic 95% CI, 0.96-1.00). Use of the AI-assisted method was associated with significant improvements in the concordance of prostate cancer grading and quantification between the 3 pathologists (eg, pathologists 1 and 2: 90.1% agreement using AI-assisted method vs 84.0% agreement using manual method; P < .001) and significantly higher weighted κ values for all pathologists (eg, pathologists 2 and 3: weighted κ = 0.92 [asymptotic 95% CI, 0.90-0.94] for AI-assisted method vs 0.76 [asymptotic 95% CI, 0.71-0.80] for manual method; P < .001) compared with the manual method.
In this diagnostic study, an AI-powered platform was able to detect, grade, and quantify prostate cancer with high accuracy and efficiency and was associated with significant reductions in interobserver variability. These results suggest that an AI-powered platform could potentially transform histopathological evaluation and improve risk stratification and clinical management of prostate cancer.
自其发展以来,格里森分级系统一直是前列腺癌预后最可靠的工具。然而,其临床应用仍然受到分级和定量方面观察者间变异性的限制,这对前列腺癌的风险评估和临床管理产生了负面影响。
研究人工智能(AI)辅助前列腺癌分级和定量方法的影响。
设计、设置和参与者:这项诊断研究于 2017 年 8 月 2 日至 2019 年 12 月 30 日在威斯康星大学麦迪逊分校进行。该研究按时间顺序选择了 589 名经活检证实患有前列腺癌的男性,这些男性在 2005 年 1 月 1 日至 2017 年 2 月 28 日期间在威斯康星大学卫生系统接受治疗。共选择了 1000 张活检切片(每位患者 1 或 2 张切片)进行扫描以创建数字全玻片图像,这些图像用于开发和验证基于深度学习卷积神经网络的人工智能平台。全玻片图像被分为训练集(n=838)和验证集(n=162)。三位经验丰富的学术泌尿科病理学家(W.H.、K.A.I.和 R.H.,以下分别称为病理学家 1、2 和 3)参与了验证。数据于 2018 年 12 月 29 日收集,并于 2019 年 12 月 20 日进行分析,分析时间为 2020 年 1 月 4 日至 2021 年 3 月 1 日。
人工智能平台检测前列腺癌的准确性以及 3 位病理学家使用手动与人工智能辅助方法进行前列腺癌分级和定量的比较。
在 589 名有活检切片的男性中,平均(SD)年龄为 63.8(8.2)岁,平均(SD)前列腺特异性抗原水平为 10.2(16.2)ng/mL,总癌症体积为 15.4%(20.1%)。AI 系统能够以高准确率区分前列腺癌与良性前列腺上皮和基质,在像素级别的接收器工作特征曲线下面积为 0.92(95%CI,0.88-0.95)。该 AI 系统在检测前列腺癌方面与训练病理学家(病理学家 1)几乎达到了完美的一致性(加权 κ=0.97;渐近 95%CI,0.96-0.98),并且在对前列腺癌进行分级方面也达到了完美的一致性(加权 κ=0.98;渐近 95%CI,0.96-1.00)。使用人工智能辅助方法显著提高了 3 位病理学家在前列腺癌分级和定量方面的一致性(例如,病理学家 1 和 2:使用人工智能辅助方法的一致性为 90.1%,使用手动方法的一致性为 84.0%;P<0.001),并且所有病理学家的加权 κ 值都显著提高(例如,病理学家 2 和 3:使用人工智能辅助方法的加权 κ 值为 0.92[渐近 95%CI,0.90-0.94],使用手动方法的加权 κ 值为 0.76[渐近 95%CI,0.71-0.80];P<0.001)与手动方法相比。
在这项诊断研究中,人工智能驱动的平台能够以高精度和高效率检测、分级和定量前列腺癌,并且显著降低了观察者间的变异性。这些结果表明,人工智能平台有可能改变组织病理学评估,并改善前列腺癌的风险分层和临床管理。