Technische Hochschule Ingolstadt, Ingolstadt, Germany.
Gestalt Diagnostics, Spokane, USA.
Sci Rep. 2024 Nov 1;14(1):26273. doi: 10.1038/s41598-024-77244-6.
The count of mitotic figures (MFs) observed in hematoxylin and eosin (H&E)-stained slides is an important prognostic marker, as it is a measure for tumor cell proliferation. However, the identification of MFs has a known low inter-rater agreement. In a computer-aided setting, deep learning algorithms can help to mitigate this, but they require large amounts of annotated data for training and validation. Furthermore, label noise introduced during the annotation process may impede the algorithms' performance. Unlike H&E, where identification of MFs is based mainly on morphological features, the mitosis-specific antibody phospho-histone H3 (PHH3) specifically highlights MFs. Counting MFs on slides stained against PHH3 leads to higher agreement among raters and has therefore recently been used as a ground truth for the annotation of MFs in H&E. However, as PHH3 facilitates the recognition of cells indistinguishable from H&E staining alone, the use of this ground truth could potentially introduce an interpretation shift and even label noise into the H&E-related dataset, impacting model performance. This study analyzes the impact of PHH3-assisted MF annotation on inter-rater reliability and object level agreement through an extensive multi-rater experiment. Subsequently, MF detectors, including a novel dual-stain detector, were evaluated on the resulting datasets to investigate the influence of PHH3-assisted labeling on the models' performance. We found that the annotators' object-level agreement significantly increased when using PHH3-assisted labeling (F1: 0.53 to 0.74). However, this enhancement in label consistency did not translate to improved performance for H&E-based detectors, neither during the training phase nor the evaluation phase. Conversely, the dual-stain detector was able to benefit from the higher consistency. This reveals an information mismatch between the H&E and PHH3-stained images as the cause of this effect, which renders PHH3-assisted annotations not well-aligned for use with H&E-based detectors. Based on our findings, we propose an improved PHH3-assisted labeling procedure.
有丝分裂计数(MFs)在苏木精和伊红(H&E)染色切片中的观察是一个重要的预后标志物,因为它是衡量肿瘤细胞增殖的指标。然而,MFs 的识别具有已知的低观察者间一致性。在计算机辅助环境中,深度学习算法可以帮助缓解这一问题,但它们需要大量的标注数据进行训练和验证。此外,在标注过程中引入的标签噪声可能会影响算法的性能。与主要基于形态特征识别 MFs 的 H&E 不同,有丝分裂特异性抗体磷酸组蛋白 H3(PHH3)专门突出显示 MFs。在 PHH3 染色的载玻片上计数 MFs 可提高观察者间的一致性,因此最近已被用作 H&E 中 MF 标注的真实数据。然而,由于 PHH3 有助于识别与 H&E 染色本身难以区分的细胞,因此使用此真实数据可能会将解释偏差甚至标签噪声引入与 H&E 相关的数据集,从而影响模型性能。本研究通过广泛的多观察者实验分析了 PHH3 辅助 MF 标注对观察者间可靠性和对象级一致性的影响。随后,在生成的数据集上评估了 MF 检测器,包括一种新型的双重染色检测器,以研究 PHH3 辅助标记对模型性能的影响。我们发现,使用 PHH3 辅助标记时,注释者的对象级一致性显著提高(F1:从 0.53 提高到 0.74)。然而,这种标签一致性的提高并没有转化为基于 H&E 的检测器在训练阶段和评估阶段的性能提高。相反,双重染色检测器能够从更高的一致性中受益。这表明 H&E 和 PHH3 染色图像之间存在信息不匹配,这是造成这种影响的原因,这使得 PHH3 辅助标注与基于 H&E 的检测器不匹配。基于我们的发现,我们提出了一种改进的 PHH3 辅助标注程序。