CEM500K，一个用于深度学习的大规模异质无标签细胞电子显微镜图像数据集。

CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning.

机构信息

Center for Molecular Microscopy, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, United States.

Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, United States.

出版信息

Elife. 2021 Apr 8;10:e65894. doi: 10.7554/eLife.65894.

DOI:10.7554/eLife.65894

PMID:33830015

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8032397/

Abstract

Automated segmentation of cellular electron microscopy (EM) datasets remains a challenge. Supervised deep learning (DL) methods that rely on region-of-interest (ROI) annotations yield models that fail to generalize to unrelated datasets. Newer unsupervised DL algorithms require relevant pre-training images, however, pre-training on currently available EM datasets is computationally expensive and shows little value for unseen biological contexts, as these datasets are large and homogeneous. To address this issue, we present CEM500K, a nimble 25 GB dataset of 0.5 × 10 unique 2D cellular EM images curated from nearly 600 three-dimensional (3D) and 10,000 two-dimensional (2D) images from >100 unrelated imaging projects. We show that models pre-trained on CEM500K learn features that are biologically relevant and resilient to meaningful image augmentations. Critically, we evaluate transfer learning from these pre-trained models on six publicly available and one newly derived benchmark segmentation task and report state-of-the-art results on each. We release the CEM500K dataset, pre-trained models and curation pipeline for model building and further expansion by the EM community. Data and code are available at https://www.ebi.ac.uk/pdbe/emdb/empiar/entry/10592/ and https://git.io/JLLTz.

摘要

细胞电子显微镜 (EM) 数据集的自动分割仍然是一个挑战。依赖于感兴趣区域 (ROI) 注释的监督深度学习 (DL) 方法生成的模型无法推广到不相关的数据集。较新的无监督 DL 算法需要相关的预训练图像，但是，在当前可用的 EM 数据集上进行预训练在计算上是昂贵的，并且对于看不见的生物背景几乎没有价值，因为这些数据集很大且同质。为了解决这个问题，我们提出了 CEM500K，这是一个灵活的 25GB 数据集，包含 0.5×10 个独特的 2D 细胞 EM 图像，这些图像是从近 600 个 3D 和 10000 个 2D 图像中提取出来的，这些图像来自于 >100 个不相关的成像项目。我们表明，在 CEM500K 上预训练的模型学习到的特征是具有生物学意义的，并且对有意义的图像增强具有弹性。至关重要的是，我们评估了这些预训练模型在六个公开可用的和一个新衍生的基准分割任务中的迁移学习，并在每个任务上报告了最先进的结果。我们发布了 CEM500K 数据集、预训练模型和模型构建的策展管道，以供 EM 社区进一步扩展。数据和代码可在 https://www.ebi.ac.uk/pdbe/emdb/empiar/entry/10592/ 和 https://git.io/JLLTz 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4ec/8032397/aa7095b2ec0d/elife-65894-fig1.jpg

相似文献

CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning.

Elife. 2021 Apr 8;10:e65894. doi: 10.7554/eLife.65894.

Segmentation in large-scale cellular electron microscopy with deep learning: A literature survey.

Med Image Anal. 2023 Oct;89:102920. doi: 10.1016/j.media.2023.102920. Epub 2023 Aug 6.

Deep learning based domain adaptation for mitochondria segmentation on EM volumes.

Comput Methods Programs Biomed. 2022 Jul;222:106949. doi: 10.1016/j.cmpb.2022.106949. Epub 2022 Jun 14.

Instance segmentation of mitochondria in electron microscopy images with a generalist deep learning model trained on a diverse dataset.

Cell Syst. 2023 Jan 18;14(1):58-71.e5. doi: 10.1016/j.cels.2022.12.006.

Bi-channel image registration and deep-learning segmentation (BIRDS) for efficient, versatile 3D mapping of mouse brain.

Elife. 2021 Jan 18;10:e63455. doi: 10.7554/eLife.63455.

Multiscale unsupervised domain adaptation for automatic pancreas segmentation in CT volumes using adversarial learning.

Med Phys. 2022 Sep;49(9):5799-5818. doi: 10.1002/mp.15827. Epub 2022 Jul 27.

Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.

Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.

Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation.

IEEE Trans Med Imaging. 2020 Jul;39(7):2531-2540. doi: 10.1109/TMI.2020.2973595. Epub 2020 Feb 12.

On the objectivity, reliability, and validity of deep learning enabled bioimage analyses.

Elife. 2020 Oct 19;9:e59780. doi: 10.7554/eLife.59780.

TEM virus images: Benchmark dataset and deep learning classification.

Comput Methods Programs Biomed. 2021 Sep;209:106318. doi: 10.1016/j.cmpb.2021.106318. Epub 2021 Jul 29.

引用本文的文献

MIFA: Metadata, Incentives, Formats and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis.

Nat Methods. 2025 Sep 15. doi: 10.1038/s41592-025-02835-8.

DeepEM Playground: Bringing deep learning to electron microscopy labs.

J Microsc. 2025 Sep;299(3):287-300. doi: 10.1111/jmi.70005. Epub 2025 Jun 28.

Deep learning-driven automated mitochondrial segmentation for analysis of complex transmission electron microscopy images.

Sci Rep. 2025 May 30;15(1):19076. doi: 10.1038/s41598-025-03311-1.

RETINA: Reconstruction-based pre-trained enhanced TransUNet for electron microscopy segmentation on the CEM500K dataset.

PLoS Comput Biol. 2025 May 28;21(5):e1013115. doi: 10.1371/journal.pcbi.1013115. eCollection 2025 May.

Volumetric Semantic Instance Segmentation of the Plasma Membrane of HeLa Cells.

J Imaging. 2021 Jun 1;7(6):93. doi: 10.3390/jimaging7060093.

Modular segmentation, spatial analysis and visualization of volume electron microscopy datasets.

Nat Protoc. 2024 May;19(5):1436-1466. doi: 10.1038/s41596-024-00957-5. Epub 2024 Feb 29.

Electron Microscopy Techniques for 3D Plant ER Imaging.

Methods Mol Biol. 2024;2772:15-25. doi: 10.1007/978-1-0716-3710-4_2.

Morphomics via next-generation electron microscopy.

J Mol Cell Biol. 2024 Apr 10;15(12). doi: 10.1093/jmcb/mjad081.

Current Progress and Challenges in Large-Scale 3D Mitochondria Instance Segmentation.

IEEE Trans Med Imaging. 2023 Dec;42(12):3956-3971. doi: 10.1109/TMI.2023.3320497. Epub 2023 Nov 30.

Volume electron microscopy.

Nat Rev Methods Primers. 2022 Jul 7;2:51. doi: 10.1038/s43586-022-00131-9.

本文引用的文献

Deep learning for automatic segmentation of the nuclear envelope in electron microscopy data, trained with volunteer segmentations.

Traffic. 2021 Jul;22(7):240-253. doi: 10.1111/tra.12789. Epub 2021 May 16.

Dense cellular segmentation for EM using 2D-3D neural network ensembles.

Sci Rep. 2021 Jan 28;11(1):2561. doi: 10.1038/s41598-021-81590-0.

Semantic segmentation of HeLa cells: An objective comparison between one traditional algorithm and four deep-learning architectures.

PLoS One. 2020 Oct 2;15(10):e0230605. doi: 10.1371/journal.pone.0230605. eCollection 2020.

Automatic segmentation of mitochondria and endolysosomes in volumetric electron microscopy data.

Comput Biol Med. 2020 Apr;119:103693. doi: 10.1016/j.compbiomed.2020.103693. Epub 2020 Mar 3.

Correlative three-dimensional super-resolution and block-face electron microscopy of whole vitreously frozen cells.

Science. 2020 Jan 17;367(6475). doi: 10.1126/science.aaz5357.

The Relative Performance of Ensemble Methods with Deep Convolutional Neural Networks for Image Classification.

J Appl Stat. 2018;45(15):2800-2818. doi: 10.1080/02664763.2018.1441383. Epub 2018 Feb 26.

Quantitative 3D Mapping of the Human Skeletal Muscle Mitochondrial Network.

Cell Rep. 2019 Jan 22;26(4):996-1009.e4. doi: 10.1016/j.celrep.2019.01.010. Epub 2019 Jan 15.

Detection of herpesvirus capsids in transmission electron microscopy images using transfer learning.

Histochem Cell Biol. 2019 Feb;151(2):101-114. doi: 10.1007/s00418-018-1759-5. Epub 2018 Nov 28.

Analyzing Image Segmentation for Connectomics.

Front Neural Circuits. 2018 Nov 13;12:102. doi: 10.3389/fncir.2018.00102. eCollection 2018.

A community-developed open-source computational ecosystem for big neuro data.

Nat Methods. 2018 Nov;15(11):846-847. doi: 10.1038/s41592-018-0181-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CEM500K，一个用于深度学习的大规模异质无标签细胞电子显微镜图像数据集。

CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献