From the Departments of Pathology (Lami, K. Tanaka, Fukuoka), Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan.
Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan; the Department of Pathology, Kameda Medical Center, Kamogawa, Japan (Bychkov).
Arch Pathol Lab Med. 2023 Aug 1;147(8):885-895. doi: 10.5858/arpa.2022-0051-OA.
CONTEXT.—: The accurate identification of different lung adenocarcinoma histologic subtypes is important for determining prognosis but can be challenging because of overlaps in the diagnostic features, leading to considerable interobserver variability.
OBJECTIVE.—: To provide an overview of the diagnostic agreement for lung adenocarcinoma subtypes among pathologists and to create a ground truth using the clustering approach for downstream computational applications.
DESIGN.—: Three sets of lung adenocarcinoma histologic images with different evaluation levels (small patches, areas with relatively uniform histology, and whole slide images) were reviewed by 17 international expert lung pathologists and 1 pathologist in training. Each image was classified into one or several lung adenocarcinoma subtypes.
RESULTS.—: Among the 4702 patches of the first set, 1742 (37%) had an overall consensus among all pathologists. The overall Fleiss κ score for the agreement of all subtypes was 0.58. Using cluster analysis, pathologists were hierarchically grouped into 2 clusters, with κ scores of 0.588 and 0.563 in clusters 1 and 2, respectively. Similar results were obtained for the second and third sets, with fair-to-moderate agreements. Patches from the first 2 sets that obtained the consensus of the 18 pathologists were retrieved to form consensus patches and were regarded as the ground truth of lung adenocarcinoma subtypes.
CONCLUSIONS.—: Our observations highlight discrepancies among experts when assessing lung adenocarcinoma subtypes. However, a subsequent number of consensus patches could be retrieved from each cluster, which can be used as ground truth for the downstream computational pathology applications, with minimal influence from interobserver variability.
准确识别不同的肺腺癌组织学亚型对于确定预后很重要,但由于诊断特征存在重叠,导致观察者间存在相当大的变异性,因此具有一定挑战性。
提供肺腺癌亚型在病理学家之间的诊断一致性概述,并使用聚类方法为下游计算应用程序创建真实数据。
由 17 名国际肺病理专家和 1 名受训病理学家对 3 组具有不同评估水平(小斑块、组织学相对均匀的区域和全切片图像)的肺腺癌组织学图像进行了回顾。每个图像都被归类为一种或多种肺腺癌亚型。
在第一组的 4702 个斑块中,有 1742 个(37%)得到了所有病理学家的总体共识。所有亚型的总体 Fleiss κ 评分一致性为 0.58。使用聚类分析,病理学家被分为 2 个聚类,聚类 1 和聚类 2 的 κ 评分分别为 0.588 和 0.563。第二组和第三组也得到了相似的结果,具有适度到良好的一致性。从前两组获得 18 位病理学家共识的斑块被检索出来,形成共识斑块,并被视为肺腺癌亚型的真实数据。
我们的观察结果强调了专家在评估肺腺癌亚型时存在差异。然而,从每个聚类中可以检索到一定数量的共识斑块,这些共识斑块可作为下游计算病理学应用程序的真实数据,最小程度地受到观察者间变异性的影响。