Baker Gabrielle M, Bret-Mounet Vanessa C, Wang Tengteng, Veta Mitko, Zheng Hanqiao, Collins Laura C, Eliassen A Heather, Tamimi Rulla M, Heng Yujing J
Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
J Pathol Inform. 2022 Jun 28;13:100118. doi: 10.1016/j.jpi.2022.100118. eCollection 2022.
Digital pathology can efficiently assess immunohistochemistry (IHC) data on tissue microarrays (TMAs). Yet, it remains important to evaluate the comparability of the data acquired by different software applications and validate it against pathologist manual interpretation. In this study, we compared the IHC quantification of 5 clinical breast cancer biomarkers-estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), epidermal growth factor receptor (EGFR), and cytokeratin 5/6 (CK5/6)-across 3 software applications (Definiens Tissue Studio, inForm, and QuPath) and benchmarked the results to pathologist manual scores. IHC expression for each marker was evaluated across 4 TMAs consisting of 935 breast tumor tissue cores from 367 women within the Nurses' Health Studies; each women contributing three 0.6-mm cores. The correlation and agreement between manual and software-derived results were primarily assessed using Spearman's ρ, percentage agreement, and area under the curve (AUC). At the TMA core-level, the correlations between manual and software-derived scores were the highest for HER2 (ρ ranging from 0.75 to 0.79), followed by ER (0.69-0.71), PR (0.67-0.72), CK5/6 (0.43-0.47), and EGFR (0.38-0.45). At the case-level, there were good correlations between manual and software-derived scores for all 5 markers (ρ ranging from 0.43 to 0.82), where QuPath had the highest correlations. Software-derived scores were highly comparable to each other (ρ ranging from 0.80 to 0.99). The average percentage agreements between manual and software-derived scores were excellent for ER (90.8%-94.5%) and PR (78.2%-85.2%), moderate for HER2 (65.4%-77.0%), highly variable for EGFR (48.2%-82.8%), and poor for CK5/6 (22.4%-45.0%). All AUCs across markers and software applications were ≥0.83. The 3 software applications were highly comparable to each other and to manual scores in quantifying these 5 markers. QuPath consistently produced the best performance, indicating this open-source software is an excellent alternative for future use.
数字病理学能够有效地评估组织微阵列(TMA)上的免疫组织化学(IHC)数据。然而,评估不同软件应用程序获取的数据的可比性,并对照病理学家的手动解读进行验证,仍然很重要。在本研究中,我们比较了三种软件应用程序(Definiens Tissue Studio、inForm和QuPath)对5种临床乳腺癌生物标志物——雌激素受体(ER)、孕激素受体(PR)、人表皮生长因子受体2(HER2)、表皮生长因子受体(EGFR)和细胞角蛋白5/6(CK5/6)——的IHC定量,并将结果与病理学家的手动评分进行了基准对比。在护士健康研究中,对来自367名女性的935个乳腺肿瘤组织芯块组成的4个TMA评估了每个每个每个标志物的IHC表达;每位女性贡献三个0.6毫米的芯块。主要使用Spearman氏ρ、百分比一致性和曲线下面积(AUC)评估手动和软件得出的结果之间的相关性和一致性。在TMA芯块水平,HER2的手动和软件得出的评分之间的相关性最高(ρ范围为0.75至0.79),其次是ER(0.69 - 0.7)、PR(0.67 - 0.72)、CK5/6(0.43 - 0.47)和EGFR(0.38 - 0.45)。在病例水平,所有5种标志物的手动和软件得出的评分之间均具有良好的相关性(ρ范围为0.43至0.82),其中QuPath的相关性最高。软件得出的评分彼此之间具有高度可比性(ρ范围为0.80至0.99)。手动和软件得出的评分之间的平均百分比一致性对于ER(90.8% - 94.5%)和PR(78.2% - 85.2%)极佳,对于HER2(65.4% - 77.0%)中等,对于EGFR(48.2% - 82.8%)变化很大,对于CK5/6(22.4% - 45.0%)较差。所有标志物和软件应用程序的AUC均≥0.83。这三种软件应用程序在定量这5种标志物方面彼此之间以及与手动评分具有高度可比性。QuPath始终表现最佳,表明这种开源软件是未来使用的极佳选择。