Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
Department of Pathology, Christian-Albrechts University, Kiel, Germany.
Gastric Cancer. 2023 Sep;26(5):708-720. doi: 10.1007/s10120-023-01398-x. Epub 2023 Jun 3.
The Laurén classification is widely used for Gastric Cancer (GC) histology subtyping. However, this classification is prone to interobserver variability and its prognostic value remains controversial. Deep Learning (DL)-based assessment of hematoxylin and eosin (H&E) stained slides is a potentially useful tool to provide an additional layer of clinically relevant information, but has not been systematically assessed in GC.
We aimed to train, test and externally validate a deep learning-based classifier for GC histology subtyping using routine H&E stained tissue sections from gastric adenocarcinomas and to assess its potential prognostic utility.
We trained a binary classifier on intestinal and diffuse type GC whole slide images for a subset of the TCGA cohort (N = 166) using attention-based multiple instance learning. The ground truth of 166 GC was obtained by two expert pathologists. We deployed the model on two external GC patient cohorts, one from Europe (N = 322) and one from Japan (N = 243). We assessed classification performance using the Area Under the Receiver Operating Characteristic Curve (AUROC) and prognostic value (overall, cancer specific and disease free survival) of the DL-based classifier with uni- and multivariate Cox proportional hazard models and Kaplan-Meier curves with log-rank test statistics.
Internal validation using the TCGA GC cohort using five-fold cross-validation achieved a mean AUROC of 0.93 ± 0.07. External validation showed that the DL-based classifier can better stratify GC patients' 5-year survival compared to pathologist-based Laurén classification for all survival endpoints, despite frequently divergent model-pathologist classifications. Univariate overall survival Hazard Ratios (HRs) of pathologist-based Laurén classification (diffuse type versus intestinal type) were 1.14 (95% Confidence Interval (CI) 0.66-1.44, p-value = 0.51) and 1.23 (95% CI 0.96-1.43, p-value = 0.09) in the Japanese and European cohorts, respectively. DL-based histology classification resulted in HR of 1.46 (95% CI 1.18-1.65, p-value < 0.005) and 1.41 (95% CI 1.20-1.57, p-value < 0.005), in the Japanese and European cohorts, respectively. In diffuse type GC (as defined by the pathologist), classifying patients using the DL diffuse and intestinal classifications provided a superior survival stratification, and demonstrated statistically significant survival stratification when combined with pathologist classification for both the Asian (overall survival log-rank test p-value < 0.005, HR 1.43 (95% CI 1.05-1.66, p-value = 0.03) and European cohorts (overall survival log-rank test p-value < 0.005, HR 1.56 (95% CI 1.16-1.76, p-value < 0.005)).
Our study shows that gastric adenocarcinoma subtyping using pathologist's Laurén classification as ground truth can be performed using current state of the art DL techniques. Patient survival stratification seems to be better by DL-based histology typing compared with expert pathologist histology typing. DL-based GC histology typing has potential as an aid in subtyping. Further investigations are warranted to fully understand the underlying biological mechanisms for the improved survival stratification despite apparent imperfect classification by the DL algorithm.
Laurén 分类被广泛用于胃癌(GC)组织学分型。然而,这种分类方法容易受到观察者间的变异,其预后价值仍存在争议。基于深度学习(DL)的苏木精和伊红(H&E)染色切片评估是提供额外临床相关信息的潜在有用工具,但尚未在 GC 中进行系统评估。
我们旨在使用来自胃腺癌的常规 H&E 染色组织切片,为 GC 组织学分型训练、测试和外部验证基于深度学习的分类器,并评估其潜在的预后实用性。
我们使用基于注意力的多实例学习方法,在 TCGA 队列的一个子集(N=166)中训练肠型和弥漫型 GC 的全幻灯片图像的二分类器。166 例 GC 的真实情况由两位专家病理学家获得。我们将模型部署在两个外部 GC 患者队列上,一个来自欧洲(N=322),一个来自日本(N=243)。我们使用受试者工作特征曲线下的面积(AUROC)和单变量和多变量 Cox 比例风险模型以及对数秩检验统计量的 Kaplan-Meier 曲线评估 DL 分类器的预后价值(总体、癌症特异性和无病生存率)。
使用 TCGA GC 队列进行的内部验证使用五重交叉验证,平均 AUROC 为 0.93±0.07。外部验证表明,基于 DL 的分类器可以更好地分层 GC 患者的 5 年生存率,与基于病理学家的 Laurén 分类相比,所有生存终点的表现均优于病理学家,尽管模型与病理学家的分类经常存在分歧。基于病理学家的 Laurén 分类(弥漫型与肠型)的单变量总体生存风险比(HR)分别为 1.14(95%置信区间(CI)0.66-1.44,p 值=0.51)和 1.23(95%CI 0.96-1.43,p 值=0.09)在日本和欧洲队列中。基于 DL 的组织学分类在日本和欧洲队列中导致 HR 分别为 1.46(95%CI 1.18-1.65,p 值<0.005)和 1.41(95%CI 1.20-1.57,p 值<0.005)。在弥漫型 GC(由病理学家定义)中,使用 DL 弥漫型和肠型分类对患者进行分类提供了更好的生存分层,并在与病理学家分类结合时显示出统计学上显著的生存分层,无论是在亚洲(总体生存对数秩检验 p 值<0.005,HR 1.43(95%CI 1.05-1.66,p 值=0.03)和欧洲队列(总体生存对数秩检验 p 值<0.005,HR 1.56(95%CI 1.16-1.76,p 值<0.005))。
我们的研究表明,使用当前最先进的 DL 技术,可以对基于病理学家的 Laurén 分类的胃腺癌亚型进行分类。与专家病理学家的组织学分类相比,基于 DL 的组织学分类似乎可以更好地对患者进行生存分层。基于 DL 的 GC 组织学分类有可能作为辅助亚型分类的工具。需要进一步研究以充分了解尽管 DL 算法的分类似乎并不完美,但改善生存分层的潜在生物学机制。