• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于机器学习蛋白质粒子挑选的大型专家 curated 低温电子显微镜图像数据集。

A large expert-curated cryo-EM image dataset for machine learning protein particle picking.

机构信息

Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA.

Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY, 11973, USA.

出版信息

Sci Data. 2023 Jun 22;10(1):392. doi: 10.1038/s41597-023-02280-2.

DOI:10.1038/s41597-023-02280-2
PMID:37349345
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10287764/
Abstract

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 high-resolution micrographs with labelled protein particle coordinates. The labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of both AI and classical methods for automated cryo-EM protein particle picking.

摘要

冷冻电镜(cryo-EM)是确定生物大分子复合物结构的强大技术。从 cryo-EM 显微照片中挑选单个蛋白质颗粒是重建蛋白质结构的关键步骤。然而,广泛使用的基于模板的颗粒挑选过程既费力又耗时。尽管基于机器学习和人工智能(AI)的颗粒挑选有可能实现自动化,但由于缺乏大型、高质量的标记训练数据,其发展受到阻碍。为了解决这个瓶颈,我们提出了 CryoPPP,这是一个用于蛋白质颗粒挑选和分析的大型、多样、经过专家整理的 cryo-EM 图像数据集。它由从 Electron Microscopy Public Image Archive (EMPIAR) 中选择的 34 个代表性蛋白质数据集的标记 cryo-EM 显微照片(图像)组成。该数据集为 2.6 太字节,包含 9893 个具有标记蛋白质颗粒坐标的高分辨率显微照片。标记过程通过 2D 颗粒分类验证和与黄金标准的 3D 密度图验证进行了严格验证。该数据集有望极大地促进用于自动 cryo-EM 蛋白质颗粒挑选的 AI 和经典方法的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/4e47aa369084/41597_2023_2280_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/bc1faa04c4a9/41597_2023_2280_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/5b2b3c227c0b/41597_2023_2280_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/b12e7e8164fb/41597_2023_2280_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/9b4fc13c1542/41597_2023_2280_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/3405c772f213/41597_2023_2280_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/6f22b9503b00/41597_2023_2280_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/63a014857c0f/41597_2023_2280_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/fbd44ab52e3b/41597_2023_2280_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/7a0072f26d3b/41597_2023_2280_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/ed7a99553b96/41597_2023_2280_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/3848396c9da3/41597_2023_2280_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/ab3e61c45540/41597_2023_2280_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/d7929fa311ed/41597_2023_2280_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/ef2a42fc8e6d/41597_2023_2280_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/74651e3687ef/41597_2023_2280_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/4e47aa369084/41597_2023_2280_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/bc1faa04c4a9/41597_2023_2280_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/5b2b3c227c0b/41597_2023_2280_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/b12e7e8164fb/41597_2023_2280_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/9b4fc13c1542/41597_2023_2280_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/3405c772f213/41597_2023_2280_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/6f22b9503b00/41597_2023_2280_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/63a014857c0f/41597_2023_2280_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/fbd44ab52e3b/41597_2023_2280_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/7a0072f26d3b/41597_2023_2280_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/ed7a99553b96/41597_2023_2280_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/3848396c9da3/41597_2023_2280_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/ab3e61c45540/41597_2023_2280_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/d7929fa311ed/41597_2023_2280_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/ef2a42fc8e6d/41597_2023_2280_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/74651e3687ef/41597_2023_2280_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dc7/10287764/4e47aa369084/41597_2023_2280_Fig16_HTML.jpg

相似文献

1
A large expert-curated cryo-EM image dataset for machine learning protein particle picking.用于机器学习蛋白质粒子挑选的大型专家 curated 低温电子显微镜图像数据集。
Sci Data. 2023 Jun 22;10(1):392. doi: 10.1038/s41597-023-02280-2.
2
CryoPPP: A Large Expert-Labelled Cryo-EM Image Dataset for Machine Learning Protein Particle Picking.低温电子显微镜蛋白质颗粒挑选的大型专家标注低温电子显微镜图像数据集(CryoPPP)。
bioRxiv. 2023 Feb 22:2023.02.21.529443. doi: 10.1101/2023.02.21.529443.
3
CryoTransformer: a transformer model for picking protein particles from cryo-EM micrographs.CryoTransformer:一种从冷冻电镜显微图中提取蛋白质颗粒的变压器模型。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae109.
4
CryoTransformer: A Transformer Model for Picking Protein Particles from Cryo-EM Micrographs.低温变压器:一种用于从冷冻电镜显微照片中挑选蛋白质颗粒的变压器模型。
bioRxiv. 2023 Oct 23:2023.10.19.563155. doi: 10.1101/2023.10.19.563155.
5
AutoCryoPicker: an unsupervised learning approach for fully automated single particle picking in Cryo-EM images.AutoCryoPicker:一种用于 Cryo-EM 图像全自动单颗粒挑选的无监督学习方法。
BMC Bioinformatics. 2019 Jun 13;20(1):326. doi: 10.1186/s12859-019-2926-y.
6
CryoVirusDB: A Labeled Cryo-EM Image Dataset for AI-Driven Virus Particle Picking.低温病毒数据库:一个用于人工智能驱动的病毒颗粒挑选的标记低温电子显微镜图像数据集。
bioRxiv. 2023 Dec 26:2023.12.25.573312. doi: 10.1101/2023.12.25.573312.
7
Deep-learning with synthetic data enables automated picking of cryo-EM particle images of biological macromolecules.利用合成数据进行深度学习可以实现生物大分子冷冻电镜粒子图像的自动挑选。
Bioinformatics. 2020 Feb 15;36(4):1252-1259. doi: 10.1093/bioinformatics/btz728.
8
CryoSegNet: accurate cryo-EM protein particle picking by integrating the foundational AI image segmentation model and attention-gated U-Net.CryoSegNet:通过整合基础 AI 图像分割模型和注意力门控 U-Net 实现精确的冷冻电镜蛋白质粒子挑选。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae282.
9
Automatic post-picking using MAPPOS improves particle image detection from cryo-EM micrographs.使用 MAPPOS 自动进行后挑选可提高 cryo-EM 显微照片中粒子图像的检测效率。
J Struct Biol. 2013 May;182(2):59-66. doi: 10.1016/j.jsb.2013.02.008. Epub 2013 Feb 21.
10
DeepPicker: A deep learning approach for fully automated particle picking in cryo-EM.深度挑选器:一种用于冷冻电镜中全自动粒子挑选的深度学习方法。
J Struct Biol. 2016 Sep;195(3):325-336. doi: 10.1016/j.jsb.2016.07.006. Epub 2016 Jul 14.

引用本文的文献

1
UM-CPP: A Universal Model for Efficient Classification of Protein Particles in cryo-EM Micrographs with Feature Engineering.UM-CPP:一种通过特征工程对冷冻电镜显微照片中的蛋白质颗粒进行高效分类的通用模型。
ACS Omega. 2025 Jun 30;10(27):29131-29142. doi: 10.1021/acsomega.5c01660. eCollection 2025 Jul 15.
2
Multimodal deep learning integration of cryo-EM and AlphaFold3 for high-accuracy protein structure determination.用于高精度蛋白质结构测定的冷冻电镜与AlphaFold3的多模态深度学习整合
bioRxiv. 2025 Jul 3:2025.07.03.663071. doi: 10.1101/2025.07.03.663071.
3
Self-supervised learning for generalizable particle picking in cryo-EM micrographs.

本文引用的文献

1
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.将蛋白质序列和结构与转换器和等变图神经网络相结合,以预测蛋白质功能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.
2
Deep learning for reconstructing protein structures from cryo-EM density maps: Recent advances and future directions.从冷冻电镜密度图重建蛋白质结构的深度学习:最新进展和未来方向。
Curr Opin Struct Biol. 2023 Apr;79:102536. doi: 10.1016/j.sbi.2023.102536. Epub 2023 Feb 9.
3
Improving Protein-Ligand Interaction Modeling with cryo-EM Data, Templates, and Deep Learning in 2021 Ligand Model Challenge.
用于冷冻电镜显微照片中可通用颗粒挑选的自监督学习
Cell Rep Methods. 2025 Jul 21;5(7):101089. doi: 10.1016/j.crmeth.2025.101089. Epub 2025 Jul 7.
4
Structural Biology for Target Identification and Validation.用于靶点识别与验证的结构生物学
Methods Mol Biol. 2025;2905:17-49. doi: 10.1007/978-1-0716-4418-8_2.
5
Artificial intelligence in cryo-EM protein particle picking: recent advances and remaining challenges.冷冻电镜蛋白质颗粒挑选中的人工智能:最新进展与尚存挑战
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf011.
6
UPicker: a semi-supervised particle picking transformer method for cryo-EM micrographs.UPicker:一种用于冷冻电镜显微照片的半监督粒子挑选变压器方法。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae636.
7
REliable PIcking by Consensus (REPIC): a consensus methodology for harnessing multiple cryo-EM particle pickers.可靠共识挑选法(REPIC):一种利用多个冷冻电镜粒子挑选器的共识方法。
Commun Biol. 2024 Oct 31;7(1):1421. doi: 10.1038/s42003-024-07045-0.
8
Improving protein function prediction by learning and integrating representations of protein sequences and function labels.通过学习和整合蛋白质序列及功能标签的表示来改进蛋白质功能预测。
Bioinform Adv. 2024 Aug 17;4(1):vbae120. doi: 10.1093/bioadv/vbae120. eCollection 2024.
9
Exploring treatment options in cancer: Tumor treatment strategies.探索癌症的治疗选择:肿瘤治疗策略。
Signal Transduct Target Ther. 2024 Jul 17;9(1):175. doi: 10.1038/s41392-024-01856-7.
10
De novo atomic protein structure modeling for cryoEM density maps using 3D transformer and HMM.利用 3D 转换器和 HMM 对冷冻电镜密度图进行从头原子蛋白结构建模。
Nat Commun. 2024 Jun 29;15(1):5511. doi: 10.1038/s41467-024-49647-6.
2021 年配体模型挑战赛:利用冷冻电镜数据、模板和深度学习改进蛋白质-配体相互作用建模。
Biomolecules. 2023 Jan 9;13(1):132. doi: 10.3390/biom13010132.
4
EMPIAR: the Electron Microscopy Public Image Archive.EMPIAR:电子显微镜公共图像档案。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1503-D1511. doi: 10.1093/nar/gkac1062.
5
Ligand recognition and allosteric modulation of the human MRGPRX1 receptor.人源 MRGPRX1 受体的配体识别和别构调节。
Nat Chem Biol. 2023 Apr;19(4):416-422. doi: 10.1038/s41589-022-01173-6. Epub 2022 Oct 27.
6
Structural Basis for Binding of Potassium-Competitive Acid Blockers to the Gastric Proton Pump.钾离子竞争型酸阻滞剂与胃质子泵结合的结构基础。
J Med Chem. 2022 Jun 9;65(11):7843-7853. doi: 10.1021/acs.jmedchem.2c00338. Epub 2022 May 23.
7
Structure of the bile acid transporter and HBV receptor NTCP.胆酸转运蛋白和 HBV 受体 NTCP 的结构。
Nature. 2022 Jun;606(7916):1021-1026. doi: 10.1038/s41586-022-04845-4. Epub 2022 May 17.
8
Cryo-EM structure of a SARS-CoV-2 omicron spike protein ectodomain.新冠病毒奥密克戎变异株刺突蛋白胞外域的冷冻电镜结构
Nat Commun. 2022 Mar 3;13(1):1214. doi: 10.1038/s41467-022-28882-9.
9
Structures of human pannexin-1 in nanodiscs reveal gating mediated by dynamic movement of the N terminus and phospholipids.人源连接蛋白 1 在纳米盘结构中的结构,揭示了 N 端和磷脂的动态运动介导的门控。
Sci Signal. 2022 Feb 8;15(720):eabg6941. doi: 10.1126/scisignal.abg6941.
10
Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions.人工智能在蛋白质-配体相互作用预测中的应用:最新进展与未来方向。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab476.