CEA, CNRS, MIRCen, Laboratoire Des Maladies Neurodégénératives, Université Paris-Saclay, Fontenay-aux-Roses, France.
Witsee, Paris, France.
Sci Rep. 2021 Nov 26;11(1):22973. doi: 10.1038/s41598-021-02344-6.
In preclinical research, histology images are produced using powerful optical microscopes to digitize entire sections at cell scale. Quantification of stained tissue relies on machine learning driven segmentation. However, such methods require multiple additional information, or features, which are increasing the quantity of data to process. As a result, the quantity of features to deal with represents a drawback to process large series or massive histological images rapidly in a robust manner. Existing feature selection methods can reduce the amount of required information but the selected subsets lack reproducibility. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation of high-resolution histological images. This selection has two steps: (1) selection at features families scale (an intermediate pool of features, between spaces and individual features) and (2) feature selection performed on pre-selected features families. We show that the selected sets of features are stables for two different neuron staining. In order to test different configurations, one of these dataset is a mono-subject dataset and the other is a multi-subjects dataset to test different configurations. Furthermore, the feature selection results in a significant reduction of computation time and memory cost. This methodology will allow exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research.
在临床前研究中,使用强大的光学显微镜对整个细胞尺度的切片进行数字化处理,生成组织学图像。染色组织的定量分析依赖于机器学习驱动的分割。然而,这些方法需要多个额外的信息或特征,这增加了需要处理的数据量。因此,需要处理的特征数量成为快速、稳健地处理大量系列或大量组织学图像的一个缺点。现有的特征选择方法可以减少所需信息的数量,但选择的子集缺乏可重复性。我们提出了一种新的方法,在高性能计算(HPC)基础设施上运行,旨在找到少量稳定的特征集,用于快速稳健地分割高分辨率组织学图像。该选择有两个步骤:(1)在特征家族尺度上进行选择(特征空间和单个特征之间的中间特征池),(2)在预选的特征家族上进行特征选择。我们表明,所选特征集对于两种不同的神经元染色是稳定的。为了测试不同的配置,其中一个数据集是单主题数据集,另一个是多主题数据集,以测试不同的配置。此外,特征选择显著减少了计算时间和内存成本。这种方法将允许在 HPC 基础设施上进行高分辨率的组织学研究,用于临床前和临床研究。