Department of Pathology and Laboratory Medicine, Zucker School of Medicine at Hofstra Northwell, United States of America.
Biostatistics Unit, Feinstein Institute for Medical Research, Northwell Health, United States of America.
Ann Diagn Pathol. 2019 Dec;43:151420. doi: 10.1016/j.anndiagpath.2019.151420. Epub 2019 Nov 7.
Colorectal carcinomas are one of the most commonly diagnosed malignancies. There are many prognostic factors relating to clinical course and disease progression, including tumor stage, metastasis, and tumor budding. In 2016, the International Tumor Budding Consensus Conference (ITBCC) created a system to uniformly assess tumor budding. This system includes a 3-tier system for the grading of tumor budding. In the past, there lacked uniform consensus, however the general grading practice was based on a 2-tiered system. Given that tumor budding is considered to have prognostic value, the accuracy and reproducibility of its assessment is vital. Our study aims to look at interobserver agreement in the scoring of tumor budding.
A total of 233 cases of colorectal carcinoma diagnosed in our health system were retrospectively analyzed and routine H&E stained slides of these cases were collected. A representative slide for tumor budding was selected per case. Four investigators with different levels of experience and expertise evaluated the selected slide of each case for tumor budding. Scoring was based on the ITBCC protocol. Clinico-pathological data was collected for each case and analyzed with tumor budding scores. Tumor budding scores per individual investigator and consensus tumor budding score were compared to patient and tumor characteristics including patient survival, tumor grade, tumor stage, and lymph node status.
Inter-observer agreement was calculated using Gwet's Agreement Coefficient (AC) and associated 95% confidence intervals was used to compare the ratings made by 4 pathologists. Overall, there was variation among pathologists in tumor budding score (Gwet's agreement coefficient = 0.25 and 0.326 for 3-tier and 2-tier grading system, respectively). Results show higher reliability with the 2-tier system compared to the 3-tier system. Tumor stage was significantly associated with budding score for all individual investigators and the consensus value (p value < 0.001).
There is low inter-observer agreement in the assessment of tumor budding in colorectal carcinoma. This suggests that it is difficult to uniformly grade tumor budding and that our classification system needs improvement. We found that the older 2-tier system (Hase et al.) results in slightly higher inter-observer agreement than the recently proposed 3-tier grading system (ITBCC, 2016), though both systems lead to suboptimal agreement. Worth noting is that observers with subspecialty GI training and more work experience had higher inter-observer agreement. Our results showed that subspecialty training tends to increase agreement more than overall work experience. In addition, our exploratory results showed that there is an association of tumor budding score to tumor stage. While increasing refinement in classification, the 3-tiered system resulted in decreased agreement in tumor budding assessment. Clearly, there is more work to be done in the identification and quantification of tumor buds.
结直肠癌是最常见的诊断恶性肿瘤之一。有许多与临床过程和疾病进展相关的预后因素,包括肿瘤分期、转移和肿瘤芽生。2016 年,国际肿瘤芽生共识会议(ITBCC)创建了一个系统来统一评估肿瘤芽生。该系统包括用于分级肿瘤芽生的 3 级系统。过去,缺乏统一的共识,但一般的分级实践是基于 2 级系统。由于肿瘤芽生被认为具有预后价值,因此其评估的准确性和可重复性至关重要。我们的研究旨在研究肿瘤芽生评分的观察者间一致性。
回顾性分析了我们医疗系统诊断的 233 例结直肠癌病例,并收集了这些病例的常规 H&E 染色切片。每个病例选择一个代表肿瘤芽生的代表性切片。4 名具有不同经验和专业知识水平的研究人员评估了每个病例的选定切片的肿瘤芽生情况。评分基于 ITBCC 方案。收集了每个病例的临床病理数据,并根据肿瘤芽生评分进行了分析。比较了每位观察者的肿瘤芽生评分和共识肿瘤芽生评分与患者生存、肿瘤分级、肿瘤分期和淋巴结状态等患者和肿瘤特征。
使用 Gwet 一致性系数(AC)计算观察者间一致性,并使用 95%置信区间比较了 4 名病理学家的评分。总体而言,病理学家在肿瘤芽生评分方面存在差异(Gwet 一致性系数分别为 0.25 和 0.326,用于 3 级和 2 级分级系统)。结果表明,2 级系统比 3 级系统具有更高的可靠性。对于所有个别研究者和共识值,肿瘤分期与芽生评分显著相关(p 值<0.001)。
在结直肠癌中评估肿瘤芽生存在低观察者间一致性。这表明很难统一分级肿瘤芽生,我们的分类系统需要改进。我们发现,较旧的 2 级系统(Hase 等人)比最近提出的 3 级分级系统(ITBCC,2016 年)产生的观察者间一致性略高,尽管这两种系统都导致了较差的一致性。值得注意的是,具有胃肠科专业培训和更多工作经验的观察者具有更高的观察者间一致性。我们的结果表明,专业培训比整体工作经验更能增加一致性。此外,我们的探索性结果表明,肿瘤芽生评分与肿瘤分期之间存在关联。虽然分类的细化程度有所提高,但 3 级系统导致肿瘤芽生评估的一致性降低。显然,在识别和量化肿瘤芽方面还有更多工作要做。