PDBrenum：一个提供根据 UniProt 序列重新编号的蛋白质数据库文件的网络服务器和程序。

PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.

机构信息

Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russian Federation.

Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania, United States of America.

出版信息

PLoS One. 2021 Jul 6;16(7):e0253411. doi: 10.1371/journal.pone.0253411. eCollection 2021.

DOI:10.1371/journal.pone.0253411

PMID:34228733

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8259974/

Abstract

The Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions, including binding partners such as ligands, nucleic acids, or other proteins; mutations, and post-translational modifications, thus enabling extensive comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. For instance, some authors may include N-terminal signal peptides or the N-terminal methionine in the sequence numbering and others may not. In addition to the coordinates, there are many fields that contain structural and functional information regarding specific residues numbered according to the author. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries or a list of UniProt identifiers (e.g., "P04637" or "P53_HUMAN") and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe.

摘要

蛋白质数据库（PDB）于 1971 年在布鲁克海文国家实验室成立，是生物大分子晶体结构的档案库。2021 年年中，该数据库几乎拥有 18 万个通过 X 射线晶体学、核磁共振、冷冻电子显微镜和其他方法解决的结构。许多蛋白质已经在不同条件下进行了研究，包括结合配偶体，如配体、核酸或其他蛋白质；突变和翻译后修饰，从而能够进行广泛的结构-功能比较研究。然而，由于 PDB 允许作者以他们希望的任何方式对每个蛋白质序列中的氨基酸进行编号，这些研究变得更加困难。这导致相同的蛋白质在可用的 PDB 条目中有不同的编号。例如，一些作者可能会在序列编号中包含 N 端信号肽或 N 端甲硫氨酸，而另一些作者则不会。除了坐标之外，还有许多字段包含根据作者编号的特定残基的结构和功能信息。在这里，我们提供了一个网络服务器和 Python3 应用程序，通过用从相应的 UniProt 序列派生的编号替换作者编号来解决 PDB 序列编号问题。我们从 PDBe 的 SIFTS 数据库中获得了这种对应关系。该服务器和程序可以接受 PDB 条目的列表或 UniProt 标识符的列表（例如，"P04637"或"P53_HUMAN"），并为 PDBe 提供的不对称单元文件和生物组装文件提供以 mmCIF 格式和传统 PDB 格式重命名的文件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce9a/8259974/0a311c086f70/pone.0253411.g001.jpg

相似文献

PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.PDBrenum：一个提供根据 UniProt 序列重新编号的蛋白质数据库文件的网络服务器和程序。

PLoS One. 2021 Jul 6;16(7):e0253411. doi: 10.1371/journal.pone.0253411. eCollection 2021.

SIFTS: Structure Integration with Function, Taxonomy and Sequences resource.SIFTS：结构整合与功能、分类学和序列资源。

Nucleic Acids Res. 2013 Jan;41(Database issue):D483-9. doi: 10.1093/nar/gks1258. Epub 2012 Nov 29.

Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive.蛋白质数据库（PDB）：单一的全球大分子结构存档库。

Methods Mol Biol. 2017;1607:627-641. doi: 10.1007/978-1-4939-7000-1_26.

Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data.为 PDB 数据提供从 UniProtKB 和其他生物数据库获取最新残留级注释的统一访问。

Sci Data. 2023 Apr 12;10(1):204. doi: 10.1038/s41597-023-02101-6.

The Protein Data Bank Archive.蛋白质数据库档案。

Methods Mol Biol. 2021;2305:3-21. doi: 10.1007/978-1-0716-1406-8_1.

The Protein Data Bank: unifying the archive.蛋白质数据库：整合存档资源。

Nucleic Acids Res. 2002 Jan 1;30(1):245-8. doi: 10.1093/nar/30.1.245.

Intrinsic disorder in the Protein Data Bank.蛋白质数据库中的内在无序状态。

J Biomol Struct Dyn. 2007 Feb;24(4):325-42. doi: 10.1080/07391102.2007.10507123.

BeEM: fast and faithful conversion of mmCIF format structure files to PDB format.BeEM：快速且准确地将 mmCIF 格式结构文件转换为 PDB 格式。

BMC Bioinformatics. 2023 Jun 20;24(1):260. doi: 10.1186/s12859-023-05388-9.

PDBe: Protein Data Bank in Europe.PDBe：欧洲蛋白质数据库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D285-91. doi: 10.1093/nar/gkt1180. Epub 2013 Nov 27.

PDBe: Protein Data Bank in Europe.PDBe：欧洲蛋白质数据库。

Nucleic Acids Res. 2010 Jan;38(Database issue):D308-17. doi: 10.1093/nar/gkp916. Epub 2009 Oct 25.

引用本文的文献

PyPropel: a Python-based tool for efficiently processing and characterising protein data.PyPropel：一个用于高效处理和表征蛋白质数据的基于Python的工具。

BMC Bioinformatics. 2025 Mar 1;26(1):70. doi: 10.1186/s12859-025-06079-3.

ASpdb: an integrative knowledgebase of human protein isoforms from experimental and AI-predicted structures.ASpdb：一个整合了来自实验和人工智能预测结构的人类蛋白质异构体的知识库。

Nucleic Acids Res. 2025 Jan 6;53(D1):D331-D339. doi: 10.1093/nar/gkae1018.

Mut-Map: Comprehensive Computational Pipeline for Structural Mapping and Analysis of Cancer-Associated Mutations.Mut-Map：用于癌症相关突变的结构映射和分析的综合计算流程。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae514.

Improving AlphaFold Predicted Contacts for Alpha-Helical Transmembrane Proteins Using Structural Features.利用结构特征改进针对 α-螺旋跨膜蛋白的 AlphaFold 预测接触。

Int J Mol Sci. 2024 May 11;25(10):5247. doi: 10.3390/ijms25105247.

DrugMap: A quantitative pan-cancer analysis of cysteine ligandability.DrugMap：半胱氨酸配体能力的泛癌定量分析。

Cell. 2024 May 9;187(10):2536-2556.e30. doi: 10.1016/j.cell.2024.03.027. Epub 2024 Apr 22.

DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues.DRBpred：一种基于序列的机器学习方法，可有效预测 DNA 和 RNA 结合残基。

Comput Biol Med. 2024 Mar;170:108081. doi: 10.1016/j.compbiomed.2024.108081. Epub 2024 Jan 29.

Brewpitopes: a pipeline to refine B-cell epitope predictions during public health emergencies.酿酒位：公共卫生突发事件期间用于精炼 B 细胞表位预测的流水线。

Front Immunol. 2023 Dec 6;14:1278534. doi: 10.3389/fimmu.2023.1278534. eCollection 2023.

Dark kinase annotation, mining, and visualization using the Protein Kinase Ontology.利用蛋白激酶本体进行暗激酶注释、挖掘和可视化。

PeerJ. 2023 Dec 5;11:e16087. doi: 10.7717/peerj.16087. eCollection 2023.

AFRbase: a database of protein mutations responsible for antifungal resistance.AFRbase：一个导致抗真菌药物耐药性的蛋白质突变数据库。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad677.

Chemoproteomic capture of RNA binding activity in living cells.在活细胞中化学生物学捕获 RNA 结合活性。

Nat Commun. 2023 Oct 7;14(1):6282. doi: 10.1038/s41467-023-41844-z.

本文引用的文献

ProtCID: a data resource for structural information on protein interactions.ProtCID：蛋白质相互作用结构信息数据库。

Nat Commun. 2020 Feb 5;11(1):711. doi: 10.1038/s41467-020-14301-4.

PDBe: improved findability of macromolecular structure data in the PDB.PDBe：提高 PDB 中大分子结构数据的可发现性。

Nucleic Acids Res. 2020 Jan 8;48(D1):D335-D343. doi: 10.1093/nar/gkz990.

SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins.SIFTS：更新后的结构整合功能、分类学和序列资源允许基于结构注释的蛋白质覆盖率增加 40 倍。

Nucleic Acids Res. 2019 Jan 8;47(D1):D482-D489. doi: 10.1093/nar/gky1114.

UniProt: a worldwide hub of protein knowledge.UniProt：蛋白质知识的全球枢纽。

Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515. doi: 10.1093/nar/gky1049.

GenBank.GenBank。

Nucleic Acids Res. 2019 Jan 8;47(D1):D94-D99. doi: 10.1093/nar/gky989.

RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy.RCSB 蛋白质数据库：生物大分子结构，推动基础生物学、生物医学、生物技术和能源领域的研究和教育。

Nucleic Acids Res. 2019 Jan 8;47(D1):D464-D474. doi: 10.1093/nar/gky1004.

New tools and functions in data-out activities at Protein Data Bank Japan (PDBj).日本蛋白质数据银行（PDBj）数据输出活动中的新工具和功能。

Protein Sci. 2018 Jan;27(1):95-102. doi: 10.1002/pro.3273. Epub 2017 Sep 18.

Design of potent IGF1-R inhibitors related to bis-azaindoles.双氮杂吲哚类有效 IGF1-R 抑制剂的设计。

Chem Biol Drug Des. 2010 Aug;76(2):100-6. doi: 10.1111/j.1747-0285.2010.00991.x. Epub 2010 Jun 9.

SSMap: a new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase.SSMap：一种用于在UniProt/Swiss-Prot知识库中整理结构相关信息的新型UniProt-PDB映射资源。

BMC Bioinformatics. 2008 Sep 23;9:391. doi: 10.1186/1471-2105-9-391.

Analysis and improvements to Kabat and structurally correct numbering of antibody variable domains.对卡巴特（Kabat）编号系统的分析及抗体可变区结构正确编号的改进。

Mol Immunol. 2008 Aug;45(14):3832-9. doi: 10.1016/j.molimm.2008.05.022. Epub 2008 Jul 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PDBrenum：一个提供根据 UniProt 序列重新编号的蛋白质数据库文件的网络服务器和程序。

PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献