Suppr超能文献

一个用于在分布偏移情况下测量血管分割方法性能的新数据集。

A new dataset for measuring the performance of blood vessel segmentation methods under distribution shifts.

作者信息

Viana da Silva Matheus, de Carvalho Santos Natália, Ouellette Julie, Lacoste Baptiste, Comin Cesar H

机构信息

Department of Computer Science, Federal University of S ao Carlos, São Carlos, Brazil.

São Carlos Institute of Physics, University of São Paulo, São Carlos, Brazil.

出版信息

PLoS One. 2025 May 27;20(5):e0322048. doi: 10.1371/journal.pone.0322048. eCollection 2025.

Abstract

Creating a dataset for training supervised machine learning algorithms can be a demanding task. This is especially true for blood vessel segmentation since one or more specialists are usually required for image annotation, and creating ground truth labels for just a single image can take up to several hours. In addition, it is paramount that the annotated samples represent well the different conditions that might affect the imaged tissues as well as possible changes in the image acquisition process. This can only be achieved by considering samples that are typical in the dataset as well as atypical, or even outlier, samples. We introduce VessMAP, an annotated and highly heterogeneous blood vessel segmentation dataset acquired by carefully sampling relevant images from a large non-annotated dataset containing fluorescence microscopy images. Each image of the dataset contains metadata information regarding the contrast, amount of noise, density, and intensity variability of the vessels. Prototypical and atypical samples were carefully selected from the base dataset using the available metadata information, thus defining an assorted set of images that can be used for measuring the performance of segmentation algorithms on samples that are highly distinct from each other. We show that datasets traditionally used for developing new blood vessel segmentation algorithms tend to have low heterogeneity. Thus, neural networks trained on as few as four samples can generalize well to all other samples. In contrast, the training samples used for the VessMAP dataset can be critical to the generalization capability of a neural network. For instance, training on samples with good contrast leads to models with poor inference quality. Interestingly, while some training sets lead to Dice scores as low as 0.59, a careful selection of the training samples results in a Dice score of 0.85. Thus, the VessMAP dataset can be used for the development of new active learning methods for selecting relevant samples for manual annotation as well as for analyzing the robustness of segmentation models to distribution shifts of the data.

摘要

创建用于训练监督式机器学习算法的数据集可能是一项艰巨的任务。对于血管分割来说尤其如此,因为通常需要一位或多位专家进行图像标注,而且仅为一张图像创建真实标签可能需要长达数小时。此外,至关重要的是,标注样本要尽可能好地代表可能影响成像组织的不同状况以及图像采集过程中可能出现的变化。这只能通过考虑数据集中典型的样本以及非典型甚至异常值样本才能实现。我们引入了VessMAP,这是一个经过标注且高度异质的血管分割数据集,它是通过从一个包含荧光显微镜图像的大型未标注数据集中仔细采样相关图像而获得的。数据集中的每张图像都包含有关血管对比度、噪声量、密度和强度变异性的元数据信息。利用可用的元数据信息从基础数据集中精心挑选出典型和非典型样本,从而定义了一组多样的图像,可用于衡量分割算法在彼此高度不同的样本上的性能。我们表明,传统上用于开发新血管分割算法的数据集往往异质性较低。因此,仅在四个样本上训练的神经网络就能很好地推广到所有其他样本。相比之下,用于VessMAP数据集的训练样本对于神经网络的泛化能力可能至关重要。例如,在对比度良好的样本上进行训练会导致推理质量较差的模型。有趣的是,虽然一些训练集的骰子系数低至0.59,但仔细选择训练样本会使骰子系数达到0.85。因此,VessMAP数据集可用于开发新的主动学习方法,以选择相关样本进行人工标注,以及分析分割模型对数据分布变化的鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4847/12112280/a3504248f720/pone.0322048.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验