• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为 PDB 数据提供从 UniProtKB 和其他生物数据库获取最新残留级注释的统一访问。

Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data.

机构信息

Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

AstraZeneca, Biomedical Campus, 1 Francis Crick Ave, Trumpington, Cambridge, CB2 0AA, UK.

出版信息

Sci Data. 2023 Apr 12;10(1):204. doi: 10.1038/s41597-023-02101-6.

DOI:10.1038/s41597-023-02101-6
PMID:37045837
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10097656/
Abstract

More than 61,000 proteins have up-to-date correspondence between their amino acid sequence (UniProtKB) and their 3D structures (PDB), enabled by the Structure Integration with Function, Taxonomy and Sequences (SIFTS) resource. SIFTS incorporates residue-level annotations from many other biological resources. SIFTS data is available in various formats like XML, CSV and TSV format or also accessible via the PDBe REST API but always maintained separately from the structure data (PDBx/mmCIF file) in the PDB archive. Here, we extended the wwPDB PDBx/mmCIF data dictionary with additional categories to accommodate SIFTS data and added the UniProtKB, Pfam, SCOP2, and CATH residue-level annotations directly into the PDBx/mmCIF files from the PDB archive. With the integrated UniProtKB annotations, these files now provide consistent numbering of residues in different PDB entries allowing easy comparison of structure models. The extended dictionary yields a more consistent, standardised metadata description without altering the core PDB information. This development enables up-to-date cross-reference information at the residue level resulting in better data interoperability, supporting improved data analysis and visualisation.

摘要

超过 61000 种蛋白质的氨基酸序列(UniProtKB)和它们的三维结构(PDB)之间有最新的对应关系,这得益于结构整合功能、分类和序列(SIFTS)资源。SIFTS 整合了来自许多其他生物资源的残基水平注释。SIFTS 数据以 XML、CSV 和 TSV 格式等多种格式提供,也可以通过 PDBe REST API 访问,但始终与 PDB 档案中的结构数据(PDBx/mmCIF 文件)分开维护。在这里,我们扩展了 wwPDB PDBx/mmCIF 数据字典,增加了额外的类别,以容纳 SIFTS 数据,并直接将 UniProtKB、Pfam、SCOP2 和 CATH 的残基水平注释添加到来自 PDB 档案的 PDBx/mmCIF 文件中。通过整合的 UniProtKB 注释,这些文件现在为不同 PDB 条目中的残基提供了一致的编号,允许轻松比较结构模型。扩展后的字典提供了更一致、标准化的元数据描述,而不会改变核心 PDB 信息。这一发展实现了残基水平的最新交叉引用信息,从而提高了数据互操作性,支持改进的数据分析和可视化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/cb2ad217cd85/41597_2023_2101_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/ae55d9222813/41597_2023_2101_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/506bd883050c/41597_2023_2101_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/3c14ac8d303c/41597_2023_2101_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/3e2fbed8f0a0/41597_2023_2101_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/4262e67da36e/41597_2023_2101_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/122de33ec0c7/41597_2023_2101_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/1d2fd1e85ebc/41597_2023_2101_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/cb2ad217cd85/41597_2023_2101_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/ae55d9222813/41597_2023_2101_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/506bd883050c/41597_2023_2101_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/3c14ac8d303c/41597_2023_2101_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/3e2fbed8f0a0/41597_2023_2101_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/4262e67da36e/41597_2023_2101_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/122de33ec0c7/41597_2023_2101_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/1d2fd1e85ebc/41597_2023_2101_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6269/10097656/cb2ad217cd85/41597_2023_2101_Fig8_HTML.jpg

相似文献

1
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data.为 PDB 数据提供从 UniProtKB 和其他生物数据库获取最新残留级注释的统一访问。
Sci Data. 2023 Apr 12;10(1):204. doi: 10.1038/s41597-023-02101-6.
2
SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins.SIFTS:更新后的结构整合功能、分类学和序列资源允许基于结构注释的蛋白质覆盖率增加 40 倍。
Nucleic Acids Res. 2019 Jan 8;47(D1):D482-D489. doi: 10.1093/nar/gky1114.
3
SIFTS: Structure Integration with Function, Taxonomy and Sequences resource.SIFTS:结构整合与功能、分类学和序列资源。
Nucleic Acids Res. 2013 Jan;41(Database issue):D483-9. doi: 10.1093/nar/gks1258. Epub 2012 Nov 29.
4
Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive.蛋白质数据库(PDB):单一的全球大分子结构存档库。
Methods Mol Biol. 2017;1607:627-641. doi: 10.1007/978-1-4939-7000-1_26.
5
PDB NextGen Archive: centralizing access to integrated annotations and enriched structural information by the Worldwide Protein Data Bank.PDB NextGen Archive:通过全球蛋白质数据库集中访问集成注释和丰富的结构信息。
Database (Oxford). 2024 May 27;2024. doi: 10.1093/database/baae041.
6
The Protein Data Bank Archive.蛋白质数据库档案。
Methods Mol Biol. 2021;2305:3-21. doi: 10.1007/978-1-0716-1406-8_1.
7
Mapping PDB chains to UniProtKB entries.将蛋白质数据银行(PDB)链映射到通用蛋白质知识库(UniProtKB)条目。
Bioinformatics. 2005 Dec 1;21(23):4297-301. doi: 10.1093/bioinformatics/bti694. Epub 2005 Sep 27.
8
PDBe: improved accessibility of macromolecular structure data from PDB and EMDB.蛋白质数据银行欧洲节点(PDBe):提高了从蛋白质数据银行(PDB)和电子显微镜数据库(EMDB)获取大分子结构数据的便捷性。
Nucleic Acids Res. 2016 Jan 4;44(D1):D385-95. doi: 10.1093/nar/gkv1047. Epub 2015 Oct 17.
9
RCSB Protein Data Bank: Sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education.RCSB蛋白质数据库:维持一个鲜活的数字数据资源,助力科研和生物医学教育取得突破。
Protein Sci. 2018 Jan;27(1):316-330. doi: 10.1002/pro.3331. Epub 2017 Nov 11.
10
PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.PDBrenum:一个提供根据 UniProt 序列重新编号的蛋白质数据库文件的网络服务器和程序。
PLoS One. 2021 Jul 6;16(7):e0253411. doi: 10.1371/journal.pone.0253411. eCollection 2021.

引用本文的文献

1
Functional (re)annotation of proteome using integrative sequence and AI-based structural approaches.使用整合序列和基于人工智能的结构方法对蛋白质组进行功能(重新)注释。
Curr Res Struct Biol. 2025 Aug 6;10:100172. doi: 10.1016/j.crstbi.2025.100172. eCollection 2025 Dec.
2
Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature.来自人机交互方法的数据集,用于从文献中识别功能重要的蛋白质残基。
Sci Data. 2024 Sep 27;11(1):1032. doi: 10.1038/s41597-024-03841-9.
3
PDB NextGen Archive: centralizing access to integrated annotations and enriched structural information by the Worldwide Protein Data Bank.

本文引用的文献

1
ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure Models.ModelCIF:用于计算结构模型的 PDBx/mmCIF 数据表示的扩展。
J Mol Biol. 2023 Jul 15;435(14):168021. doi: 10.1016/j.jmb.2023.168021. Epub 2023 Feb 23.
2
A strategy for evaluating potential antiviral resistance to small molecule drugs and application to SARS-CoV-2.评估小分子药物潜在抗病毒耐药性的策略及其在 SARS-CoV-2 中的应用。
Sci Rep. 2023 Jan 10;13(1):502. doi: 10.1038/s41598-023-27649-6.
3
Annotation of biologically relevant ligands in UniProtKB using ChEBI.
PDB NextGen Archive:通过全球蛋白质数据库集中访问集成注释和丰富的结构信息。
Database (Oxford). 2024 May 27;2024. doi: 10.1093/database/baae041.
4
Machine Learning Models to Interrogate Proteome-Wide Covalent Ligandabilities Directed at Cysteines.用于探究针对半胱氨酸的全蛋白质组共价配体能力的机器学习模型
JACS Au. 2024 Apr 5;4(4):1374-1384. doi: 10.1021/jacsau.3c00749. eCollection 2024 Apr 22.
5
Machine Learning Models to Interrogate Proteomewide Covalent Ligandabilities Directed at Cysteines.用于探究靶向半胱氨酸的全蛋白质组共价配体能力的机器学习模型
bioRxiv. 2024 Jan 7:2023.08.17.553742. doi: 10.1101/2023.08.17.553742.
使用 ChEBI 对 UniProtKB 中的生物相关配体进行注释。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac793.
4
The 3D mutational constraint on amino acid sites in the human proteome.人类蛋白质组中氨基酸位点的 3D 突变约束。
Nat Commun. 2022 Jun 7;13(1):3273. doi: 10.1038/s41467-022-30936-x.
5
Venus: Elucidating the Impact of Amino Acid Variants on Protein Function Beyond Structure Destabilisation.金星:阐明氨基酸变体对蛋白质功能的影响,超越结构不稳定。
J Mol Biol. 2022 Jun 15;434(11):167567. doi: 10.1016/j.jmb.2022.167567. Epub 2022 Mar 29.
6
PDBx/mmCIF Ecosystem: Foundational Semantic Tools for Structural Biology.PDBx/mmCIF 生态系统:结构生物学的基础语义工具。
J Mol Biol. 2022 Jun 15;434(11):167599. doi: 10.1016/j.jmb.2022.167599. Epub 2022 Apr 20.
7
New system for archiving integrative structures.新的整体结构存档系统。
Acta Crystallogr D Struct Biol. 2021 Dec 1;77(Pt 12):1486-1496. doi: 10.1107/S2059798321010871. Epub 2021 Nov 29.
8
RCSB Protein Data Bank: improved annotation, search and visualization of membrane protein structures archived in the PDB.RCSB 蛋白质数据库:改善 PDB 中储存的膜蛋白结构的注释、搜索和可视化功能。
Bioinformatics. 2022 Feb 7;38(5):1452-1454. doi: 10.1093/bioinformatics/btab813.
9
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.AlphaFold 蛋白质结构数据库:用高精度模型极大地扩展蛋白质序列空间的结构覆盖范围。
Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. doi: 10.1093/nar/gkab1061.
10
PDBe-KB: collaboratively defining the biological context of structural data.PDBe-KB:协同定义结构数据的生物学背景。
Nucleic Acids Res. 2022 Jan 7;50(D1):D534-D542. doi: 10.1093/nar/gkab988.