Suppr超能文献

使用深度学习在多期 CT 扫描中进行胰腺分割的大型数据集整理的主要挑战:重点关注基数、手动细化和注释质量。

Main challenges on the curation of large scale datasets for pancreas segmentation using deep learning in multi-phase CT scans: Focus on cardinality, manual refinement, and annotation quality.

机构信息

Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano, 20133, Italy; Fondazione MIAS (AIMS Academy), Piazza dell'Ospedale Maggiore 3, Milano, 20162, Italy.

Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano, 20133, Italy.

出版信息

Comput Med Imaging Graph. 2024 Oct;117:102434. doi: 10.1016/j.compmedimag.2024.102434. Epub 2024 Sep 13.

Abstract

Accurate segmentation of the pancreas in computed tomography (CT) holds paramount importance in diagnostics, surgical planning, and interventions. Recent studies have proposed supervised deep-learning models for segmentation, but their efficacy relies on the quality and quantity of the training data. Most of such works employed small-scale public datasets, without proving the efficacy of generalization to external datasets. This study explored the optimization of pancreas segmentation accuracy by pinpointing the ideal dataset size, understanding resource implications, examining manual refinement impact, and assessing the influence of anatomical subregions. We present the AIMS-1300 dataset encompassing 1,300 CT scans. Its manual annotation by medical experts required 938 h. A 2.5D UNet was implemented to assess the impact of training sample size on segmentation accuracy by partitioning the original AIMS-1300 dataset into 11 smaller subsets of progressively increasing numerosity. The findings revealed that training sets exceeding 440 CTs did not lead to better segmentation performance. In contrast, nnU-Net and UNet with Attention Gate reached a plateau for 585 CTs. Tests on generalization on the publicly available AMOS-CT dataset confirmed this outcome. As the size of the partition of the AIMS-1300 training set increases, the number of error slices decreases, reaching a minimum with 730 and 440 CTs, for AIMS-1300 and AMOS-CT datasets, respectively. Segmentation metrics on the AIMS-1300 and AMOS-CT datasets improved more on the head than the body and tail of the pancreas as the dataset size increased. By carefully considering the task and the characteristics of the available data, researchers can develop deep learning models without sacrificing performance even with limited data. This could accelerate developing and deploying artificial intelligence tools for pancreas surgery and other surgical data science applications.

摘要

在计算机断层扫描(CT)中,胰腺的精确分割在诊断、手术规划和干预中至关重要。最近的研究提出了用于分割的监督深度学习模型,但它们的效果依赖于训练数据的质量和数量。大多数此类工作使用小规模的公共数据集,而没有证明对外部数据集的泛化效果。本研究通过确定理想的数据集大小、了解资源影响、检查手动细化的影响以及评估解剖亚区的影响,探讨了优化胰腺分割准确性的方法。我们提出了包含 1300 个 CT 扫描的 AIMS-1300 数据集。它的手动注释由医学专家完成,共耗时 938 小时。我们实施了 2.5D UNet,通过将原始 AIMS-1300 数据集划分为 11 个规模逐渐增加的较小子集,评估了训练样本大小对分割准确性的影响。研究结果表明,训练集超过 440 个 CT 不会导致更好的分割性能。相比之下,nnU-Net 和具有注意力门的 UNet 在 585 个 CT 时达到了一个平台期。在公开的 AMOS-CT 数据集上进行的泛化测试证实了这一结果。随着 AIMS-1300 训练集分区的规模增加,错误切片的数量减少,在分别使用 730 个和 440 个 CT 时达到最小值,对于 AIMS-1300 和 AMOS-CT 数据集。随着数据集大小的增加,在 AIMS-1300 和 AMOS-CT 数据集上,分割指标在胰腺头部的提高比在身体和尾部更明显。通过仔细考虑任务和可用数据的特点,研究人员可以在不牺牲性能的情况下开发深度学习模型,即使数据有限。这可以加速开发和部署用于胰腺手术和其他外科数据科学应用的人工智能工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验