From Biostatistics, Arbor Research Collaborative for Health, Ann Arbor, Michigan (Dr Zee); the Departments of Pathology (Dr Hodgin), Internal Medicine (Dr Mariani), and Biostatistics (Dr Gillespie), University of Michigan, Ann Arbor; Arbor Research Collaborative for Health, Ann Arbor, Michigan (Dr Mariani); the Department of Pathology & Immunology, Washington University, St Louis, Missouri (Dr Gaut); the Departments of Pathology and Laboratory Medicine (Dr Palmer) and Medicine (Dr. Holzman), University of Pennsylvania, Philadelphia; the Department of Pathology, Johns Hopkins University, Baltimore, Maryland (Drs Bagnasco and Rosenberg); the Kidney Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, Maryland (Dr Rosenberg); the Laboratory of Pathology, National Cancer Institute, Bethesda, Maryland (Dr Hewitt); and the Department of Pathology, University of Miami, Miami, Florida (Dr Barisoni).
Arch Pathol Lab Med. 2018 May;142(5):613-625. doi: 10.5858/arpa.2017-0181-OA. Epub 2018 Feb 19.
Context Testing reproducibility is critical for the development of methodologies for morphologic assessment. Our previous study using the descriptor-based Nephrotic Syndrome Study Network Digital Pathology Scoring System (NDPSS) on glomerular images revealed variable reproducibility. Objective To test reproducibility and feasibility of alternative scoring strategies for digital morphologic assessment of glomeruli and explore use of alternative agreement statistics. Design The original NDPSS was modified (NDPSS1 and NDPSS2) to evaluate (1) independent scoring of each individual biopsy level, (2) use of continuous measures, (3) groupings of individual descriptors into classes and subclasses prior to scoring, and (4) indication of pathologists' confidence/uncertainty for any given score. Three and 5 pathologists scored 157 and 79 glomeruli using the NDPSS1 and NDPSS2, respectively. Agreement was tested using conventional (Cohen κ) and alternative (Gwet agreement coefficient 1 [AC]) agreement statistics and compared with previously published data (original NDPSS). Results Overall, pathologists' uncertainty was low, favoring application of the Gwet AC. Greater agreement was achieved using the Gwet AC compared with the Cohen κ across all scoring methodologies. Mean (standard deviation) differences in agreement estimates using the NDPSS1 and NDPSS2 compared with the single-level original NDPSS were -0.09 (0.17) and -0.17 (0.17), respectively. Using the Gwet AC, 79% of the original NDPSS descriptors had good or excellent agreement. Pathologist feedback indicated the NDPSS1 and NDPSS2 were time-consuming. Conclusions The NDPSS1 and NDPSS2 increased pathologists' scoring burden without improving reproducibility. Use of alternative agreement statistics was strongly supported. We suggest using the original NDPSS on whole slide images for glomerular morphology assessment and for guiding future automated technologies.
背景 对于形态评估方法的发展,重现性测试至关重要。我们之前使用基于描述符的肾病综合征研究网络数字病理学评分系统(NDPSS)对肾小球图像进行的研究显示,重现性存在差异。目的 测试替代评分策略对肾小球进行数字形态评估的重现性和可行性,并探索替代一致性统计数据的使用。设计 对原始的 NDPSS 进行了修改(NDPSS1 和 NDPSS2),以评估(1)独立评分每个单独的活检水平,(2)使用连续测量,(3)在评分前将个体描述符分组为类别和子类,以及(4)为任何给定分数表示病理学家的信心/不确定性。三名和五名病理学家分别使用 NDPSS1 和 NDPSS2 对 157 个和 79 个肾小球进行了评分。使用传统(Cohen κ)和替代(Gwet 一致性系数 1 [AC])一致性统计数据测试了一致性,并与之前发表的数据(原始 NDPSS)进行了比较。结果 总体而言,病理学家的不确定性较低,有利于应用 Gwet AC。与所有评分方法相比,Gwet AC 实现了更高的一致性。与原始 NDPSS 相比,NDPSS1 和 NDPSS2 的一致性估计的平均(标准偏差)差异分别为-0.09(0.17)和-0.17(0.17)。使用 Gwet AC,79%的原始 NDPSS 描述符具有良好或极好的一致性。病理学家的反馈表明,NDPSS1 和 NDPSS2 很耗时。结论 NDPSS1 和 NDPSS2 增加了病理学家的评分负担,而没有提高重现性。强烈支持使用替代一致性统计数据。我们建议在全切片图像上使用原始 NDPSS 进行肾小球形态评估,并指导未来的自动化技术。