Schreiber B A, Denholm J, Gilbey J D, Schönlieb C-B, Soilleux E J
Department of Pathology, University of Cambridge, Cambridge, UK.
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK.
J Pathol Inform. 2023 Jul 19;14:100324. doi: 10.1016/j.jpi.2023.100324. eCollection 2023.
Around 1% of the population of the UK and North America have a diagnosis of coeliac disease (CD), due to a damaging immune response to the small intestine. Assessing whether a patient has CD relies primarily on the examination of a duodenal biopsy, an unavoidably subjective process with poor inter-observer concordance. Wei et al. [11] developed a neural network-based method for diagnosing CD using a dataset of duodenal biopsy whole slide images (WSIs). As all training and validation data came from one source, there was no guarantee that their results would generalize to WSIs obtained from different scanners and laboratories. In this study, the effects of applying stain normalization and jittering to the training data were compared. We trained a deep neural network on 331 WSIs obtained with a Ventana scanner (WSIs; CD: ; normal: ) to classify presence of CD. In order to test the effects of stain processing when validating on WSIs scanned on varying scanners and from varying laboratories, the neural network was validated on 4 datasets: WSIs of slides scanned on a Ventana scanner (WSIs; CD: ; normal: ), WSIs of the same slides rescanned on a Hamamatsu scanner (WSIs; CD: ; normal: ), WSIs of the same slides rescanned on an Aperio scanner (WSIs; CD: ; normal: ), and WSIs of different slides scanned on an Aperio scanner (WSIs; CD: ; normal: ). Without stain processing, the F1 scores of the neural network were , , and when validating on the Ventana validation WSIs, Hamamatsu and Aperio rescans of the Ventana validation WSIs, and Aperio WSIs from a different source respectively. With stain normalization, the performance of the neural network improved significantly with respective F1 scores , , , and . Stain jittering resulted in a better performance than stain normalization when validating on data from the same source F1 score , but resulted in poorer performance than stain normalization when validating on WSIs from different scanners (F1 scores , and ). This study shows the importance of stain processing, in particular stain normalization, when training machine learning models on duodenal biopsy WSIs to ensure generalizability between different scanners and laboratories.
在英国和北美,约1%的人口被诊断患有乳糜泻(CD),这是由于对小肠的免疫反应受损所致。评估患者是否患有CD主要依赖于十二指肠活检检查,这是一个不可避免的主观过程,观察者之间的一致性较差。Wei等人[11]开发了一种基于神经网络的方法,使用十二指肠活检全切片图像(WSIs)数据集来诊断CD。由于所有训练和验证数据都来自一个来源,因此无法保证他们的结果能够推广到从不同扫描仪和实验室获得的WSIs。在本研究中,比较了对训练数据应用染色归一化和抖动的效果。我们在使用Ventana扫描仪获得的331张WSIs(WSIs;CD:;正常:)上训练深度神经网络,以对CD的存在进行分类。为了测试在不同扫描仪和不同实验室扫描的WSIs上进行验证时染色处理的效果,该神经网络在4个数据集上进行了验证:在Ventana扫描仪上扫描的切片的WSIs(WSIs;CD:;正常:)、在Hamamatsu扫描仪上重新扫描的相同切片的WSIs(WSIs;CD:;正常:)、在Aperio扫描仪上重新扫描的相同切片的WSIs(WSIs;CD:;正常:)以及在Aperio扫描仪上扫描的不同切片的WSIs(WSIs;CD:;正常:)。在不进行染色处理的情况下,当在Ventana验证WSIs、Ventana验证WSIs的Hamamatsu和Aperio重新扫描以及来自不同来源的Aperio WSIs上进行验证时,神经网络的F1分数分别为、、和。通过染色归一化,神经网络的性能显著提高,相应的F1分数分别为、、、和。当在来自同一来源的数据上进行验证时,染色抖动的性能优于染色归一化(F1分数为),但当在来自不同扫描仪的WSIs上进行验证时,其性能比染色归一化差(F1分数为、和)。这项研究表明,在对十二指肠活检WSIs进行机器学习模型训练时,染色处理,特别是染色归一化,对于确保不同扫描仪和实验室之间的通用性非常重要。