Flannery Brennan T, Sandler Howard M, Lal Priti, Feldman Michael D, Santa-Rosario Juan C, Pathak Tilak, Mirtti Tuomas, Farre Xavier, Correa Rohann, Chafe Susan, Shah Amit, Efstathiou Jason A, Hoffman Karen, Hallman Mark A, Straza Michael, Jordan Richard, Pugh Stephanie L, Feng Felix, Madabhushi Anant
Case Western Reserve University, Cleveland, OH, USA.
Cedars-Sinai Medical Center, Los Angeles, CA, USA.
J Pathol. 2025 Feb;265(2):146-157. doi: 10.1002/path.6373. Epub 2024 Dec 11.
The presence, location, and extent of prostate cancer is assessed by pathologists using H&E-stained tissue slides. Machine learning approaches can accomplish these tasks for both biopsies and radical prostatectomies. Deep learning approaches using convolutional neural networks (CNNs) have been shown to identify cancer in pathologic slides, some securing regulatory approval for clinical use. However, differences in sample processing can subtly alter the morphology between sample types, making it unclear whether deep learning algorithms will consistently work on both types of slide images. Our goal was to investigate whether morphological differences between sample types affected the performance of biopsy-trained cancer detection CNN models when applied to radical prostatectomies and vice versa using multiple cohorts (N = 1,000). Radical prostatectomies (N = 100) and biopsies (N = 50) were acquired from The University of Pennsylvania to train (80%) and validate (20%) a DenseNet CNN for biopsies (M), radical prostatectomies (M), and a combined dataset (M). On a tile level, M and M achieved F1 scores greater than 0.88 when applied to their own sample type but less than 0.65 when applied across sample types. On a whole-slide level, models achieved significantly better performance on their own sample type compared to the alternative model (p < 0.05) for all metrics. This was confirmed by external validation using digitized biopsy slide images from a clinical trial [NRG Radiation Therapy Oncology Group (RTOG)] (NRG/RTOG 0521, N = 750) via both qualitative and quantitative analyses (p < 0.05). A comprehensive review of model outputs revealed morphologically driven decision making that adversely affected model performance. M appeared to be challenged with the analysis of open gland structures, whereas M appeared to be challenged with closed gland structures, indicating potential morphological variation between the training sets. These findings suggest that differences in morphology and heterogeneity necessitate the need for more tailored, sample-specific (i.e. biopsy and surgical) machine learning models. © 2024 The Author(s). The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
病理学家通过使用苏木精和伊红(H&E)染色的组织切片来评估前列腺癌的存在、位置和范围。机器学习方法可以完成针对活检和根治性前列腺切除术的这些任务。已证明使用卷积神经网络(CNN)的深度学习方法能够在病理切片中识别癌症,其中一些已获得临床使用的监管批准。然而,样本处理的差异可能会微妙地改变样本类型之间的形态,这使得尚不清楚深度学习算法是否能在这两种类型的幻灯片图像上始终有效。我们的目标是研究样本类型之间的形态差异是否会影响活检训练的癌症检测CNN模型在应用于根治性前列腺切除术时的性能,反之亦然,我们使用了多个队列(N = 1000)。从宾夕法尼亚大学获取了根治性前列腺切除术样本(N = 100)和活检样本(N = 50),用于训练(80%)和验证(20%)一个针对活检样本(M)、根治性前列腺切除术样本(M)以及合并数据集(M)的DenseNet CNN。在切片级别上,M和M应用于自身样本类型时F1分数大于0.88,但应用于不同样本类型时小于0.65。在整张幻灯片级别上,与替代模型相比,所有指标下模型在自身样本类型上的表现均显著更好(p < 0.05)。通过对来自一项临床试验[NRG放射治疗肿瘤学组(RTOG)](NRG/RTOG 0521,N = 七百五十)的数字化活检幻灯片图像进行定性和定量分析的外部验证证实了这一点(p < 0.05)。对模型输出的全面审查揭示了形态学驱动的决策制定,这对模型性能产生了不利影响。M在分析开放腺体结构时似乎面临挑战,而M在分析封闭腺体结构时似乎面临挑战,这表明训练集之间存在潜在的形态差异。这些发现表明,形态和异质性的差异使得需要更具针对性、特定样本(即活检和手术样本)的机器学习模型。© 2024作者。《病理学杂志》由约翰·威利父子有限公司代表大不列颠及爱尔兰病理学会出版。