Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology, Karlsruhe, Baden-Württemberg, Germany.
PLoS One. 2022 Feb 8;17(2):e0263656. doi: 10.1371/journal.pone.0263656. eCollection 2022.
Deep learning increasingly accelerates biomedical research, deploying neural networks for multiple tasks, such as image classification, object detection, and semantic segmentation. However, neural networks are commonly trained supervised on large-scale, labeled datasets. These prerequisites raise issues in biomedical image recognition, as datasets are generally small-scale, challenging to obtain, expensive to label, and frequently heterogeneously labeled. Furthermore, heterogeneous labels are a challenge for supervised methods. If not all classes are labeled for an individual sample, supervised deep learning approaches can only learn on a subset of the dataset with common labels for each individual sample; consequently, biomedical image recognition engineers need to be frugal concerning their label and ground truth requirements. This paper discusses the effects of frugal labeling and proposes to train neural networks for multi-class semantic segmentation on heterogeneously labeled data based on a novel objective function. The objective function combines a class asymmetric loss with the Dice loss. The approach is demonstrated for training on the sparse ground truth of a heterogeneous labeled dataset, training within a transfer learning setting, and the use-case of merging multiple heterogeneously labeled datasets. For this purpose, a biomedical small-scale, multi-class semantic segmentation dataset is utilized. The heartSeg dataset is based on the medaka fish's position as a cardiac model system. Automating image recognition and semantic segmentation enables high-throughput experiments and is essential for biomedical research. Our approach and analysis show competitive results in supervised training regimes and encourage frugal labeling within biomedical image recognition.
深度学习越来越多地加速了生物医学研究,将神经网络部署用于多种任务,例如图像分类、目标检测和语义分割。然而,神经网络通常在大规模、标记的数据集上进行监督式训练。这些前提条件在生物医学图像识别中引发了问题,因为数据集通常规模较小,难以获取,标记成本高昂,并且经常具有异质标签。此外,异质标签对监督方法构成了挑战。如果不是为每个样本的所有类都标记,则监督式深度学习方法只能在具有每个样本公共标签的数据集子集上进行学习;因此,生物医学图像识别工程师需要在其标签和真实值要求方面精打细算。本文讨论了精打细算标签的影响,并提出了一种基于新目标函数的方法,用于在异质标记数据上训练用于多类语义分割的神经网络。该目标函数将类不对称损失与 Dice 损失相结合。该方法针对异质标记数据集的稀疏真实值进行训练,在迁移学习设置内进行训练,并用于合并多个异质标记数据集的用例。为此,使用了一个生物医学小数据集、多类语义分割数据集。heartSeg 数据集基于鱼类作为心脏模型系统的位置。自动化图像识别和语义分割能够实现高通量实验,对生物医学研究至关重要。我们的方法和分析在监督式训练方案中显示出有竞争力的结果,并鼓励在生物医学图像识别中进行精打细算的标签。