比较人工智能技术评估分类器在从数字化组织病理学图像自动分级前列腺癌方面的性能。

Comparison of Artificial Intelligence Techniques to Evaluate Performance of a Classifier for Automatic Grading of Prostate Cancer From Digitized Histopathologic Images.

机构信息

Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia, Canada.

Department of Urologic Sciences, University of British Columbia, Vancouver, British Columbia, Canada.

出版信息

JAMA Netw Open. 2019 Mar 1;2(3):e190442. doi: 10.1001/jamanetworkopen.2019.0442.

DOI:10.1001/jamanetworkopen.2019.0442

PMID:30848813

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6484626/

Abstract

IMPORTANCE

Proper evaluation of the performance of artificial intelligence techniques in the analysis of digitized medical images is paramount for the adoption of such techniques by the medical community and regulatory agencies.

OBJECTIVES

To compare several cross-validation (CV) approaches to evaluate the performance of a classifier for automatic grading of prostate cancer in digitized histopathologic images and compare the performance of the classifier when trained using data from 1 expert and multiple experts.

DESIGN, SETTING, AND PARTICIPANTS: This quality improvement study used tissue microarray data (333 cores) from 231 patients who underwent radical prostatectomy at the Vancouver General Hospital between June 27, 1997, and June 7, 2011. Digitized images of tissue cores were annotated by 6 pathologists for 4 classes (benign and Gleason grades 3, 4, and 5) between December 12, 2016, and October 5, 2017. Patches of 192 µm2 were extracted from these images. There was no overlap between patches. A deep learning classifier based on convolutional neural networks was trained to predict a class label from among the 4 classes (benign and Gleason grades 3, 4, and 5) for each image patch. The classification performance was evaluated in leave-patches-out CV, leave-cores-out CV, and leave-patients-out 20-fold CV. The analysis was performed between November 15, 2018, and January 1, 2019.

MAIN OUTCOMES AND MEASURES

The classifier performance was evaluated by its accuracy, sensitivity, and specificity in detection of cancer (benign vs cancer) and in low-grade vs high-grade differentiation (Gleason grade 3 vs grades 4-5). The statistical significance analysis was performed using the McNemar test. The agreement level between pathologists and the classifier was quantified using a quadratic-weighted κ statistic.

RESULTS

On 333 tissue microarray cores from 231 participants with prostate cancer (mean [SD] age, 63.2 [6.3] years), 20-fold leave-patches-out CV resulted in mean (SD) accuracy of 97.8% (1.2%), sensitivity of 98.5% (1.0%), and specificity of 97.5% (1.2%) for classifying benign patches vs cancerous patches. By contrast, 20-fold leave-patients-out CV resulted in mean (SD) accuracy of 85.8% (4.3%), sensitivity of 86.3% (4.1%), and specificity of 85.5% (7.2%). Similarly, 20-fold leave-cores-out CV resulted in mean (SD) accuracy of 86.7% (3.7%), sensitivity of 87.2% (4.0%), and specificity of 87.7% (5.5%). Results of McNemar tests showed that the leave-patches-out CV accuracy, sensitivity, and specificity were significantly higher than those for both leave-patients-out CV and leave-cores-out CV. Similar results were observed for classifying low-grade cancer vs high-grade cancer. When trained on a single expert, the overall agreement in grading between pathologists and the classifier ranged from 0.38 to 0.58; when trained using the majority vote among all experts, it was 0.60.

CONCLUSIONS AND RELEVANCE

Results of this study suggest that in prostate cancer classification from histopathologic images, patch-wise CV and single-expert training and evaluation may lead to a biased estimation of classifier's performance. To allow reproducibility and facilitate comparison between automatic classification methods, studies in the field should evaluate their performance using patient-based CV and multiexpert data. Some of these conclusions may be generalizable to other histopathologic applications and to other applications of machine learning in medicine.

摘要

重要性

正确评估人工智能技术在数字化医学图像分析中的性能对于医学社区和监管机构采用这些技术至关重要。

目的

比较几种交叉验证（CV）方法，以评估用于自动分级前列腺癌的分类器在数字化组织病理学图像中的性能，并比较使用来自 1 位专家和多位专家的数据训练分类器时的性能。

设计、设置和参与者：本质量改进研究使用了来自 231 名在温哥华综合医院接受根治性前列腺切除术的患者的组织微阵列数据（333 个核心），这些患者于 1997 年 6 月 27 日至 2011 年 6 月 7 日接受治疗。组织核心的数字化图像由 6 位病理学家在 2016 年 12 月 12 日至 2017 年 10 月 5 日期间对 4 个等级（良性和 Gleason 分级 3、4 和 5）进行注释。从这些图像中提取了 192 µm2 的斑块。斑块之间没有重叠。基于卷积神经网络的深度学习分类器用于预测每个图像斑块的 4 个类别（良性和 Gleason 分级 3、4 和 5）中的类别标签。在留片外 CV、留核外 CV 和留片外 20 折 CV 中评估了分类性能。分析于 2018 年 11 月 15 日至 2019 年 1 月 1 日进行。

主要结果和措施

使用癌症（良性与癌症）和低级别与高级别分化（Gleason 分级 3 与分级 4-5）的检测准确性、敏感性和特异性来评估分类器的性能。使用 McNemar 检验进行统计学意义分析。使用二次加权κ统计量量化病理学家和分类器之间的一致性水平。

结果

在 231 名患有前列腺癌的参与者的 333 个组织微阵列核心中（平均[标准差]年龄，63.2[6.3]岁），20 折留片外 CV 的平均（标准差）准确率为 97.8%（1.2%）、敏感性为 98.5%（1.0%）和特异性为 97.5%（1.2%），用于区分良性斑块和癌性斑块。相比之下，20 折留片外 CV 的准确率为 85.8%（4.3%）、敏感性为 86.3%（4.1%）和特异性为 85.5%（7.2%）。同样，20 折留核外 CV 的平均（标准差）准确率为 86.7%（3.7%）、敏感性为 87.2%（4.0%）和特异性为 87.7%（5.5%）。McNemar 检验结果表明，留片外 CV 的准确率、敏感性和特异性均显著高于留片外 CV 和留核外 CV。对于低级别癌症与高级别癌症的分类也观察到类似的结果。当使用单个专家进行训练时，病理学家和分类器之间的总体分级一致性范围为 0.38 至 0.58；当使用所有专家的多数投票进行训练时，一致性为 0.60。

结论和相关性

本研究结果表明，在前列腺癌的组织病理学图像分类中，基于斑块的 CV 和单个专家的训练和评估可能会导致对分类器性能的有偏差估计。为了实现可重复性并促进自动分类方法之间的比较，该领域的研究应使用基于患者的 CV 和多专家数据评估其性能。这些结论中的一些可能适用于其他组织病理学应用和医学中的机器学习的其他应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f388/6484626/c82f60f0ccae/jamanetwopen-2-e190442-g001.jpg

相似文献

Comparison of Artificial Intelligence Techniques to Evaluate Performance of a Classifier for Automatic Grading of Prostate Cancer From Digitized Histopathologic Images.比较人工智能技术评估分类器在从数字化组织病理学图像自动分级前列腺癌方面的性能。

JAMA Netw Open. 2019 Mar 1;2(3):e190442. doi: 10.1001/jamanetworkopen.2019.0442.

Automatic grading of prostate cancer in digitized histopathology images: Learning from multiple experts.基于多位专家的学习：数字化组织病理学图像中前列腺癌的自动分级。

Med Image Anal. 2018 Dec;50:167-180. doi: 10.1016/j.media.2018.09.005. Epub 2018 Sep 24.

Development and Validation of an Artificial Intelligence-Powered Platform for Prostate Cancer Grading and Quantification.开发和验证一个用于前列腺癌分级和定量的人工智能平台。

JAMA Netw Open. 2021 Nov 1;4(11):e2132554. doi: 10.1001/jamanetworkopen.2021.32554.

Deep Learning-Based Gleason Grading of Prostate Cancer From Histopathology Images-Role of Multiscale Decision Aggregation and Data Augmentation.基于深度学习的前列腺癌组织病理图像 Gleason 分级——多尺度决策聚合和数据增强的作用。

IEEE J Biomed Health Inform. 2020 May;24(5):1413-1426. doi: 10.1109/JBHI.2019.2944643. Epub 2019 Sep 30.

Development of a Deep Learning Algorithm for the Histopathologic Diagnosis and Gleason Grading of Prostate Cancer Biopsies: A Pilot Study.深度学习算法在前列腺癌活检组织病理诊断和 Gleason 分级中的应用：一项初步研究。

Eur Urol Focus. 2021 Mar;7(2):347-351. doi: 10.1016/j.euf.2019.11.003. Epub 2019 Nov 22.

Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study.人工智能在前列腺癌活检中的诊断和分级：一项基于人群的诊断研究。

Lancet Oncol. 2020 Feb;21(2):222-232. doi: 10.1016/S1470-2045(19)30738-7. Epub 2020 Jan 8.

Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens.从活检标本中开发和验证用于前列腺癌 Gleason 分级的深度学习算法。

JAMA Oncol. 2020 Sep 1;6(9):1372-1380. doi: 10.1001/jamaoncol.2020.2485.

WeGleNet: A weakly-supervised convolutional neural network for the semantic segmentation of Gleason grades in prostate histology images.WeGleNet：一种用于前列腺组织学图像中 Gleason 分级语义分割的弱监督卷积神经网络。

Comput Med Imaging Graph. 2021 Mar;88:101846. doi: 10.1016/j.compmedimag.2020.101846. Epub 2021 Jan 13.

An Artificial Intelligence-based Support Tool for Automation and Standardisation of Gleason Grading in Prostate Biopsies.基于人工智能的前列腺活检 Gleason 分级自动化和标准化支持工具。

Eur Urol Focus. 2021 Sep;7(5):995-1001. doi: 10.1016/j.euf.2020.11.001. Epub 2020 Dec 7.

Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study.利用活检进行前列腺癌 Gleason 分级的自动化深度学习系统：一项诊断研究。

Lancet Oncol. 2020 Feb;21(2):233-241. doi: 10.1016/S1470-2045(19)30739-9. Epub 2020 Jan 8.

引用本文的文献

The state of the art in artificial intelligence and digital pathology in prostate cancer.前列腺癌人工智能与数字病理学的最新进展。

Nat Rev Urol. 2025 Aug 4. doi: 10.1038/s41585-025-01070-2.

The Role of Artificial Intelligence in the Evaluation of Prostate Pathology.人工智能在前列腺病理学评估中的作用。

Pathol Int. 2025 May;75(5):213-220. doi: 10.1111/pin.70015. Epub 2025 Apr 14.

Comparison of Pathologist and Artificial Intelligence-based Grading for Prediction of Metastatic Outcomes After Radical Prostatectomy.根治性前列腺切除术后病理学家与基于人工智能的分级对转移结局预测的比较

Eur Urol Oncol. 2025 Feb;8(1):9-13. doi: 10.1016/j.euo.2024.08.004. Epub 2024 Sep 3.

External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens.前列腺切除标本中前列腺癌Gleason分级人工智能模型的外部验证

BJU Int. 2025 Jan;135(1):133-139. doi: 10.1111/bju.16464. Epub 2024 Jul 11.

Harnessing artificial intelligence for prostate cancer management.利用人工智能进行前列腺癌管理。

Cell Rep Med. 2024 Apr 16;5(4):101506. doi: 10.1016/j.xcrm.2024.101506. Epub 2024 Apr 8.

Development and validation of a clinic machine-learning nomogram for the prediction of risk stratifications of prostate cancer based on functional subsets of peripheral lymphocyte.基于外周血淋巴细胞功能亚群构建列线图预测前列腺癌风险分层的临床机器学习模型的建立与验证

J Transl Med. 2023 Jul 12;21(1):465. doi: 10.1186/s12967-023-04318-w.

A comparison of performance between a deep learning model with residents for localization and classification of intracranial hemorrhage.深度学习模型与住院医师在颅内出血定位和分类中的表现比较。

Sci Rep. 2023 Jun 20;13(1):9975. doi: 10.1038/s41598-023-37114-z.

A systematic review and meta-analysis of artificial intelligence diagnostic accuracy in prostate cancer histology identification and grading.人工智能在前列腺癌组织学识别和分级中的诊断准确性的系统评价和荟萃分析。

Prostate Cancer Prostatic Dis. 2023 Dec;26(4):681-692. doi: 10.1038/s41391-023-00673-3. Epub 2023 Apr 25.

Future of Artificial Intelligence Applications in Cancer Care: A Global Cross-Sectional Survey of Researchers.人工智能在癌症治疗中的应用前景：全球研究人员的跨学科调查。

Curr Oncol. 2023 Mar 16;30(3):3432-3446. doi: 10.3390/curroncol30030260.

Exploring the Use of Artificial Intelligence in the Management of Prostate Cancer.探索人工智能在前列腺癌管理中的应用。

Curr Urol Rep. 2023 May;24(5):231-240. doi: 10.1007/s11934-023-01149-6. Epub 2023 Feb 18.

本文引用的文献

Automatic grading of prostate cancer in digitized histopathology images: Learning from multiple experts.基于多位专家的学习：数字化组织病理学图像中前列腺癌的自动分级。

Med Image Anal. 2018 Dec;50:167-180. doi: 10.1016/j.media.2018.09.005. Epub 2018 Sep 24.

Automated Gleason grading of prostate cancer tissue microarrays via deep learning.基于深度学习的前列腺癌组织微阵列 Gleason 分级自动化。

Sci Rep. 2018 Aug 13;8(1):12054. doi: 10.1038/s41598-018-30535-1.

Image analysis and machine learning in digital pathology: Challenges and opportunities.数字病理学中的图像分析与机器学习：挑战与机遇

Med Image Anal. 2016 Oct;33:170-175. doi: 10.1016/j.media.2016.06.037. Epub 2016 Jul 4.

Prostate cancer grading: use of graph cut and spatial arrangement of nuclei.前列腺癌分级：图割的应用和核的空间排列。

IEEE Trans Med Imaging. 2014 Dec;33(12):2254-70. doi: 10.1109/TMI.2014.2336883. Epub 2014 Jul 10.

Prostate histopathology: learning tissue component histograms for cancer detection and classification.前列腺组织病理学：学习组织成分直方图以进行癌症检测和分类。

IEEE Trans Med Imaging. 2013 Oct;32(10):1804-18. doi: 10.1109/TMI.2013.2265334. Epub 2013 May 31.

Digital images and the future of digital pathology.数字图像与数字病理学的未来。

J Pathol Inform. 2010 Aug 10;1:15. doi: 10.4103/2153-3539.68332.

A boosted Bayesian multiresolution classifier for prostate cancer detection from digitized needle biopsies.基于提升贝叶斯多分辨率分类器的前列腺癌数字化针吸活检诊断

IEEE Trans Biomed Eng. 2012 May;59(5):1205-18. doi: 10.1109/TBME.2010.2053540. Epub 2010 Jun 21.

High-throughput detection of prostate cancer in histological sections using probabilistic pairwise Markov models.使用概率成对马尔可夫模型对组织切片中的前列腺癌进行高通量检测。

Med Image Anal. 2010 Aug;14(4):617-29. doi: 10.1016/j.media.2010.04.007. Epub 2010 Apr 29.

Understanding interobserver agreement: the kappa statistic.理解观察者间一致性：kappa统计量。

Fam Med. 2005 May;37(5):360-3.

Interobserver reproducibility of Gleason grading of prostatic carcinoma: general pathologist.前列腺癌Gleason分级的观察者间再现性：普通病理学家

Hum Pathol. 2001 Jan;32(1):81-8. doi: 10.1053/hupa.2001.21135.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

比较人工智能技术评估分类器在从数字化组织病理学图像自动分级前列腺癌方面的性能。

Comparison of Artificial Intelligence Techniques to Evaluate Performance of a Classifier for Automatic Grading of Prostate Cancer From Digitized Histopathologic Images.

机构信息

出版信息

IMPORTANCE

OBJECTIVES

MAIN OUTCOMES AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

重要性

目的

主要结果和措施

结果

结论和相关性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献