Marini Niccolò, Otalora Sebastian, Wodzinski Marek, Tomassini Selene, Dragoni Aldo Franco, Marchand-Maillet Stephane, Morales Juan Pedro Dominguez, Duran-Lopez Lourdes, Vatrano Simona, Müller Henning, Atzori Manfredo
Information Systems Institute, University of Applied Sciences Western Switzerland (HES-SO Valais), Sierre, Switzerland.
Centre Universitaire d'Informatique, University of Geneva, Geneva, Switzerland.
J Pathol Inform. 2023 Jan 3;14:100183. doi: 10.1016/j.jpi.2022.100183. eCollection 2023.
Computational pathology targets the automatic analysis of Whole Slide Images (WSI). WSIs are high-resolution digitized histopathology images, stained with chemical reagents to highlight specific tissue structures and scanned via whole slide scanners. The application of different parameters during WSI acquisition may lead to stain color heterogeneity, especially considering samples collected from several medical centers. Dealing with stain color heterogeneity often limits the robustness of methods developed to analyze WSIs, in particular Convolutional Neural Networks (CNN), the state-of-the-art algorithm for most computational pathology tasks. Stain color heterogeneity is still an unsolved problem, although several methods have been developed to alleviate it, such as Hue-Saturation-Contrast (HSC) color augmentation and stain augmentation methods. The goal of this paper is to present Data-Driven Color Augmentation (DDCA), a method to improve the efficiency of color augmentation methods by increasing the reliability of the samples used for training computational pathology models. During CNN training, a database including over 2 million H&E color variations collected from private and public datasets is used as a reference to discard augmented data with color distributions that do not correspond to realistic data. DDCA is applied to HSC color augmentation, stain augmentation and H&E-adversarial networks in colon and prostate cancer classification tasks. DDCA is then compared with 11 state-of-the-art baseline methods to handle color heterogeneity, showing that it can substantially improve classification performance on unseen data including heterogeneous color variations.
计算病理学旨在对全切片图像(WSI)进行自动分析。WSI是高分辨率的数字化组织病理学图像,用化学试剂染色以突出特定组织结构,并通过全切片扫描仪进行扫描。在WSI采集过程中应用不同参数可能会导致染色颜色不均匀,尤其是考虑到从多个医疗中心收集的样本。处理染色颜色不均匀问题往往会限制为分析WSI而开发的方法的稳健性,特别是卷积神经网络(CNN),这是大多数计算病理学任务的最先进算法。尽管已经开发了几种方法来缓解染色颜色不均匀问题,如色调-饱和度-对比度(HSC)颜色增强和染色增强方法,但该问题仍然未得到解决。本文的目标是提出数据驱动的颜色增强(DDCA)方法,通过提高用于训练计算病理学模型的样本的可靠性来提高颜色增强方法的效率。在CNN训练期间,一个包含从私有和公共数据集中收集的超过200万种苏木精-伊红(H&E)颜色变化的数据库被用作参考,以丢弃颜色分布与现实数据不对应的增强数据。DDCA应用于结肠癌和前列腺癌分类任务中的HSC颜色增强、染色增强和H&E对抗网络。然后将DDCA与11种最先进的基线方法进行比较,以处理颜色不均匀问题,结果表明它可以显著提高对包括异质颜色变化在内的未见数据的分类性能。