Elfer Katherine, Dudgeon Sarah, Garcia Victor, Blenman Kim, Hytopoulos Evangelos, Wen Si, Li Xiaoxian, Ly Amy, Werness Bruce, Sheth Manasi S, Amgad Mohamed, Gupta Rajarsi, Saltz Joel, Hanna Matthew G, Ehinger Anna, Peeters Dieter, Salgado Roberto, Gallas Brandon D
United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States.
National Institutes of Health, National Cancer Institute, Division of Cancer Prevention, Cancer Prevention Fellowship Program, Bethesda, Maryland, United States.
J Med Imaging (Bellingham). 2022 Jul;9(4):047501. doi: 10.1117/1.JMI.9.4.047501. Epub 2022 Jul 27.
Validation of artificial intelligence (AI) algorithms in digital pathology with a reference standard is necessary before widespread clinical use, but few examples focus on creating a reference standard based on pathologist annotations. This work assesses the results of a pilot study that collects density estimates of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer biopsy specimens. This work will inform the creation of a validation dataset for the evaluation of AI algorithms fit for a regulatory purpose. Collaborators and crowdsourced pathologists contributed glass slides, digital images, and annotations. Here, "annotations" refer to any marks, segmentations, measurements, or labels a pathologist adds to a report, image, region of interest (ROI), or biological feature. Pathologists estimated sTILs density in 640 ROIs from hematoxylin and eosin stained slides of 64 patients via two modalities: an optical light microscope and two digital image viewing platforms. The pilot study generated 7373 sTILs density estimates from 29 pathologists. Analysis of annotations found the variability of density estimates per ROI increases with the mean; the root mean square differences were 4.46, 14.25, and 26.25 as the mean density ranged from 0% to 10%, 11% to 40%, and 41% to 100%, respectively. The pilot study informs three areas of improvement for future work: technical workflows, annotation platforms, and agreement analysis methods. Upgrades to the workflows and platforms will improve operability and increase annotation speed and consistency. Exploratory data analysis demonstrates the need to develop new statistical approaches for agreement. The pilot study dataset and analysis methods are publicly available to allow community feedback. The development and results of the validation dataset will be publicly available to serve as an instructive tool that can be replicated by developers and researchers.
在人工智能(AI)算法广泛应用于临床之前,有必要使用参考标准对其在数字病理学中的应用进行验证,但很少有实例专注于基于病理学家注释创建参考标准。本研究评估了一项试点研究的结果,该研究收集了乳腺癌活检标本中基质肿瘤浸润淋巴细胞(sTILs)的密度估计值。这项工作将为创建用于评估符合监管目的的AI算法的验证数据集提供参考。合作者和众包病理学家提供了玻片、数字图像和注释。在这里,“注释”指病理学家添加到报告、图像、感兴趣区域(ROI)或生物学特征上的任何标记、分割、测量或标签。病理学家通过两种方式估计了64例患者苏木精和伊红染色玻片上640个ROI中的sTILs密度:光学显微镜和两个数字图像查看平台。该试点研究由29名病理学家生成了7373个sTILs密度估计值。对注释的分析发现,每个ROI密度估计值的变异性随平均值增加;当平均密度范围分别为0%至10%、11%至40%和41%至100%时,均方根差异分别为4.46、14.25和26.25。该试点研究为未来工作的三个改进领域提供了参考:技术工作流程、注释平台和一致性分析方法。对工作流程和平台的升级将提高可操作性,并提高注释速度和一致性。探索性数据分析表明需要开发新的一致性统计方法。试点研究数据集和分析方法已公开提供,以接受社区反馈。验证数据集的开发和结果将公开提供,作为可供开发者和研究人员复制的指导性工具。