Schmidt Bogdana, Soerensen Simon John Christoph, Bhambhvani Hriday P, Fan Richard E, Bhattacharya Indrani, Choi Moon Hyung, Kunder Christian A, Kao Chia-Sui, Higgins John, Rusu Mirabela, Sonn Geoffrey A
Division of Urology, Department of Surgery, Huntsman Cancer Hospital, University of Utah, Salt Lake City, UT, USA.
Department of Urology, Stanford University School of Medicine, Stanford, CA, USA.
BJU Int. 2025 Jan;135(1):133-139. doi: 10.1111/bju.16464. Epub 2024 Jul 11.
To externally validate the performance of the DeepDx Prostate artificial intelligence (AI) algorithm (Deep Bio Inc., Seoul, South Korea) for Gleason grading on whole-mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size.
The commercially available DeepDx Prostate AI algorithm is an automated Gleason grading system that was previously trained using 1133 prostate core biopsy images and validated on 700 biopsy images from two institutions. We assessed the AI algorithm's performance, which outputs Gleason patterns (3, 4, or 5), on 500 1-mm tiles created from 150 whole-mount RP specimens from a third institution. These patterns were then grouped into grade groups (GGs) for comparison with expert pathologist assessments. The reference standard was the International Society of Urological Pathology GG as established by two experienced uropathologists with a third expert to adjudicate discordant cases. We defined the main metric as the agreement with the reference standard, using Cohen's kappa.
The agreement between the two experienced pathologists in determining GGs at the tile level had a quadratically weighted Cohen's kappa of 0.94. The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non-cancerous tissue had an unweighted Cohen's kappa of 0.91. Additionally, the AI algorithm's agreement with the reference standard in classifying tiles into GGs had a quadratically weighted Cohen's kappa of 0.89. In distinguishing cancerous vs non-cancerous tissue, the AI algorithm achieved a sensitivity of 0.997 and specificity of 0.88; in classifying GG ≥2 vs GG 1 and non-cancerous tissue, it demonstrated a sensitivity of 0.98 and specificity of 0.85.
The DeepDx Prostate AI algorithm had excellent agreement with expert uropathologists and performance in cancer identification and grading on RP specimens, despite being trained on biopsy specimens from an entirely different patient population.
鉴于将基于活检样本训练的人工智能模型应用于根治性前列腺切除术(RP)标本时,由于组织表现和样本大小的固有差异可能会观察到潜在变化,因此对DeepDx前列腺人工智能(AI)算法(韩国首尔的Deep Bio公司)在全切片前列腺组织病理学上进行Gleason分级的性能进行外部验证。
市售的DeepDx前列腺AI算法是一种自动Gleason分级系统,此前使用1133张前列腺穿刺活检图像进行训练,并在来自两个机构的700张活检图像上进行验证。我们评估了该AI算法在从第三个机构的150个全切片RP标本创建的500个1毫米切片上的性能,该算法输出Gleason模式(3、4或5)。然后将这些模式分组为分级组(GGs),以便与专家病理学家的评估进行比较。参考标准是由两名经验丰富的泌尿病理学家确定的国际泌尿病理学会GG,由第三名专家裁决不一致的病例。我们将主要指标定义为与参考标准的一致性,采用Cohen's kappa系数。
两名经验丰富的病理学家在切片水平确定GGs的一致性方面,二次加权Cohen's kappa系数为0.94。AI算法与参考标准在区分癌组织与非癌组织方面的一致性,未加权Cohen's kappa系数为0.91。此外,AI算法在将切片分类为GGs方面与参考标准的一致性,二次加权Cohen's kappa系数为0.89。在区分癌组织与非癌组织时,AI算法的灵敏度为0.997,特异度为0.88;在将GG≥2与GG 1及非癌组织进行分类时,其灵敏度为0.98,特异度为0.85。
尽管DeepDx前列腺AI算法是基于完全不同患者群体的活检标本进行训练的,但它与专家泌尿病理学家在RP标本的癌症识别和分级方面具有极好的一致性和性能。