Wulczyn Ellery, Nagpal Kunal, Symonds Matthew, Moran Melissa, Plass Markus, Reihs Robert, Nader Farah, Tan Fraser, Cai Yuannan, Brown Trissia, Flament-Auvigne Isabelle, Amin Mahul B, Stumpe Martin C, Müller Heimo, Regitnig Peter, Holzinger Andreas, Corrado Greg S, Peng Lily H, Chen Po-Hsuan Cameron, Steiner David F, Zatloukal Kurt, Liu Yun, Mermel Craig H
Google Health, Palo Alto, CA USA.
Medical University of Graz, Graz, Austria.
Commun Med (Lond). 2021 Jun 30;1:10. doi: 10.1038/s43856-021-00005-3. eCollection 2021.
Gleason grading of prostate cancer is an important prognostic factor, but suffers from poor reproducibility, particularly among non-subspecialist pathologists. Although artificial intelligence (A.I.) tools have demonstrated Gleason grading on-par with expert pathologists, it remains an open question whether and to what extent A.I. grading translates to better prognostication.
In this study, we developed a system to predict prostate cancer-specific mortality via A.I.-based Gleason grading and subsequently evaluated its ability to risk-stratify patients on an independent retrospective cohort of 2807 prostatectomy cases from a single European center with 5-25 years of follow-up (median: 13, interquartile range 9-17).
Here, we show that the A.I.'s risk scores produced a -index of 0.84 (95% CI 0.80-0.87) for prostate cancer-specific mortality. Upon discretizing these risk scores into risk groups analogous to pathologist Grade Groups (GG), the A.I. has a -index of 0.82 (95% CI 0.78-0.85). On the subset of cases with a GG provided in the original pathology report ( = 1517), the A.I.'s C-indices are 0.87 and 0.85 for continuous and discrete grading, respectively, compared to 0.79 (95% CI 0.71-0.86) for GG obtained from the reports. These represent improvements of 0.08 (95% CI 0.01-0.15) and 0.07 (95% CI 0.00-0.14), respectively.
Our results suggest that A.I.-based Gleason grading can lead to effective risk stratification, and warrants further evaluation for improving disease management.
前列腺癌的Gleason分级是一个重要的预后因素,但重复性较差,尤其是在非专科病理学家中。尽管人工智能(A.I.)工具在Gleason分级方面已显示出与专家病理学家相当的水平,但人工智能分级能否以及在多大程度上转化为更好的预后评估仍是一个悬而未决的问题。
在本研究中,我们开发了一个通过基于人工智能的Gleason分级来预测前列腺癌特异性死亡率的系统,并随后在一个来自单一欧洲中心的2807例前列腺切除病例的独立回顾性队列中评估了其对患者进行风险分层的能力,该队列有5至25年的随访时间(中位数:13年,四分位间距9 - 17年)。
在此,我们表明人工智能的风险评分对前列腺癌特异性死亡率的C指数为0.84(95%置信区间0.80 - 0.87)。将这些风险评分离散化为类似于病理学家分级组(GG)的风险组后,人工智能的C指数为0.82(95%置信区间0.78 - 0.85)。在原始病理报告中提供了GG的病例子集(n = 1517)中,人工智能连续分级和离散分级的C指数分别为0.87和0.85,而从报告中获得的GG的C指数为0.79(95%置信区间0.71 - 0.86)。这些分别代表了改善0.08(95%置信区间0.01 - 0.15)和0.07(95%置信区间0.00 - 0.14)。
我们的结果表明,基于人工智能的Gleason分级可导致有效的风险分层,值得进一步评估以改善疾病管理。