用于深度学习辅助乳腺癌分割的大规模合成病理数据集。

A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer.

机构信息

Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, 28262, USA.

Sensebrain Research, San Jose, CA, 95131, USA.

出版信息

Sci Data. 2023 Apr 21;10(1):231. doi: 10.1038/s41597-023-02125-y.

DOI:10.1038/s41597-023-02125-y

PMID:37085533

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10121551/

Abstract

The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks.

摘要

训练计算机视觉模型的成功在很大程度上依赖于具有标注的大规模真实世界图像的支持。然而，由于隐私保护和过多的标注负担，病理领域很难创建这样一个标注就绪的数据集。为了辅助计算病理学，合成数据的生成、管理和标注提供了一种具有成本效益的手段，可以快速实现所需的数据多样性，从而提高不同阶段的模型性能。在这项研究中，我们引入了一个大规模的合成病理图像数据集，并为核语义分割提供了标注，称为合成核和标注向导（SNOW）。所提出的 SNOW 是通过应用现成的图像生成器和核标注器来遵循标准化工作流程开发的。该数据集包含总共 20k 个图像块和 1448522 个带有 CC-BY 许可证的标注核。我们表明，SNOW 可以用于监督和半监督训练场景。广泛的结果表明，在各种模型训练设置下，基于合成数据训练的模型具有竞争力，可以扩大更好地利用合成图像来增强下游数据驱动临床任务的范围。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a4c/10121551/53ec487f577d/41597_2023_2125_Fig1_HTML.jpg

相似文献

A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer.用于深度学习辅助乳腺癌分割的大规模合成病理数据集。

Sci Data. 2023 Apr 21;10(1):231. doi: 10.1038/s41597-023-02125-y.

PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation.PyMIC：一个用于高效医学图像分割的深度学习工具包。

Comput Methods Programs Biomed. 2023 Apr;231:107398. doi: 10.1016/j.cmpb.2023.107398. Epub 2023 Feb 7.

Image-level supervision and self-training for transformer-based cross-modality tumor segmentation.基于 Transformer 的跨模态肿瘤分割的图像级监督和自训练。

Med Image Anal. 2024 Oct;97:103287. doi: 10.1016/j.media.2024.103287. Epub 2024 Jul 31.

Image generation by GAN and style transfer for agar plate image segmentation.基于 GAN 和风格迁移的琼脂平板图像分割的图像生成。

Comput Methods Programs Biomed. 2020 Feb;184:105268. doi: 10.1016/j.cmpb.2019.105268. Epub 2019 Dec 17.

Automatic image annotation for fluorescent cell nuclei segmentation.用于荧光细胞核分割的自动图像标注

PLoS One. 2021 Apr 16;16(4):e0250093. doi: 10.1371/journal.pone.0250093. eCollection 2021.

RIL-Contour: a Medical Imaging Dataset Annotation Tool for and with Deep Learning.RIL-Contour：一款用于深度学习的医学影像数据集标注工具。

J Digit Imaging. 2019 Aug;32(4):571-581. doi: 10.1007/s10278-019-00232-0.

Semi-supervised breast cancer pathology image segmentation based on fine-grained classification guidance.基于细粒度分类指导的半监督乳腺癌病理图像分割。

Med Biol Eng Comput. 2024 Mar;62(3):901-912. doi: 10.1007/s11517-023-02970-4. Epub 2023 Dec 12.

Light mixed-supervised segmentation for 3D medical image data.基于混合监督的 3D 医学图像数据分割。

Med Phys. 2024 Jan;51(1):167-178. doi: 10.1002/mp.16816. Epub 2023 Nov 1.

Semi-Supervised Semantic Image Segmentation by Deep Diffusion Models and Generative Adversarial Networks.基于深度扩散模型和生成对抗网络的半监督语义图像分割。

Int J Neural Syst. 2024 Nov;34(11):2450057. doi: 10.1142/S0129065724500576. Epub 2024 Aug 15.

A deep learning segmentation strategy that minimizes the amount of manually annotated images.一种深度学习分割策略，可最大限度地减少手动标注图像的数量。

F1000Res. 2021 Mar 30;10:256. doi: 10.12688/f1000research.52026.2. eCollection 2021.

引用本文的文献

Unlocking artificial intelligence, machine learning and deep learning to combat therapeutic resistance in metastatic castration-resistant prostate cancer: a comprehensive review.解锁人工智能、机器学习和深度学习以对抗转移性去势抵抗性前列腺癌中的治疗抵抗：一项综述

Ecancermedicalscience. 2025 Jul 29;19:1953. doi: 10.3332/ecancer.2025.1953. eCollection 2025.

Breast cancer detection based on histological images using fusion of diffusion model outputs.基于扩散模型输出融合的组织学图像乳腺癌检测

Sci Rep. 2025 Jul 1;15(1):21463. doi: 10.1038/s41598-025-05744-0.

Enhanced nuclear information fusion and visual transformer for pathological breast cancer image classification.用于病理乳腺癌图像分类的增强核信息融合与视觉Transformer

Sci Rep. 2025 Jun 3;15(1):19490. doi: 10.1038/s41598-025-04344-2.

Deep learning-based image analysis in muscle histopathology using photo-realistic synthetic data.基于深度学习的肌肉组织病理学图像分析：使用逼真的合成数据

Commun Med (Lond). 2025 Mar 6;5(1):64. doi: 10.1038/s43856-025-00777-y.

ImmuNet: a segmentation-free machine learning pipeline for immune landscape phenotyping in tumors by multiplex imaging.ImmuNet：一种用于通过多重成像对肿瘤免疫微环境进行表型分析的无分割机器学习流程。

Biol Methods Protoc. 2024 Dec 20;10(1):bpae094. doi: 10.1093/biomethods/bpae094. eCollection 2025.

Experts fail to reliably detect AI-generated histological data.专家无法可靠地检测到 AI 生成的组织学数据。

Sci Rep. 2024 Nov 19;14(1):28677. doi: 10.1038/s41598-024-73913-8.

Deep Learning Analysis for Predicting Tumor Spread through Air Space in Early-Stage Lung Adenocarcinoma Pathology Images.深度学习分析用于预测早期肺腺癌病理图像中肿瘤通过气腔的扩散情况。

Cancers (Basel). 2024 Jun 3;16(11):2132. doi: 10.3390/cancers16112132.

The Application of Artificial Intelligence to Cancer Research: A Comprehensive Guide.人工智能在癌症研究中的应用：全面指南。

Technol Cancer Res Treat. 2024 Jan-Dec;23:15330338241250324. doi: 10.1177/15330338241250324.

Evaluation of the precision and accuracy in the classification of breast histopathology images using the MobileNetV3 model.使用MobileNetV3模型评估乳腺组织病理学图像分类的精度和准确性。

J Pathol Inform. 2024 Apr 10;15:100377. doi: 10.1016/j.jpi.2024.100377. eCollection 2024 Dec.

Deep learning in cancer genomics and histopathology.深度学习在癌症基因组学和组织病理学中的应用。

Genome Med. 2024 Mar 27;16(1):44. doi: 10.1186/s13073-024-01315-6.

本文引用的文献

Pseudo-Data Based Self-Supervised Federated Learning for Classification of Histopathological Images.基于伪数据的自监督联邦学习用于组织病理学图像分类

IEEE Trans Med Imaging. 2024 Mar;43(3):902-915. doi: 10.1109/TMI.2023.3323540. Epub 2024 Mar 5.

Mining multi-center heterogeneous medical data with distributed synthetic learning.基于分布式合成学习的多中心异构医学数据挖掘。

Nat Commun. 2023 Sep 7;14(1):5510. doi: 10.1038/s41467-023-40687-y.

Spatially aware graph neural networks and cross-level molecular profile prediction in colon cancer histopathology: a retrospective multi-cohort study.空间感知图神经网络和结肠癌组织病理学中跨层次分子特征预测：一项回顾性多队列研究。

Lancet Digit Health. 2022 Nov;4(11):e787-e795. doi: 10.1016/S2589-7500(22)00168-6.

SAFRON: Stitching Across the Frontier Network for Generating Colorectal Cancer Histology Images.SAFRON：跨越边界网络生成结直肠癌组织学图像的缝合。

Med Image Anal. 2022 Apr;77:102337. doi: 10.1016/j.media.2021.102337. Epub 2021 Dec 29.

Genetic mutation and biological pathway prediction based on whole slide images in breast carcinoma using deep learning.基于深度学习的乳腺癌全切片图像基因突变与生物通路预测

NPJ Precis Oncol. 2021 Sep 23;5(1):87. doi: 10.1038/s41698-021-00225-9.

Synthetic data in machine learning for medicine and healthcare.机器学习在医学和医疗保健领域中的合成数据。

Nat Biomed Eng. 2021 Jun;5(6):493-497. doi: 10.1038/s41551-021-00751-8.

Robust Histopathology Image Analysis: to Label or to Synthesize?强大的组织病理学图像分析：标记还是合成？

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2019 Jun;2019:8533-8542. doi: 10.1109/CVPR.2019.00873. Epub 2020 Jan 9.

Generative Image Translation for Data Augmentation in Colorectal Histopathology Images.用于结直肠癌组织病理学图像数据增强的生成式图像翻译

Proc Mach Learn Res. 2019 Dec;116:10-24.

A review and comparison of breast tumor cell nuclei segmentation performances using deep convolutional neural networks.基于深度卷积神经网络的乳腺肿瘤细胞细胞核分割性能的评估与比较。

Sci Rep. 2021 Apr 13;11(1):8025. doi: 10.1038/s41598-021-87496-1.

Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.《全球癌症统计数据 2020：全球 185 个国家和地区 36 种癌症的发病率和死亡率估计》。

CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660. Epub 2021 Feb 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于深度学习辅助乳腺癌分割的大规模合成病理数据集。

A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献