Suppr超能文献

PDBrenum:一个提供根据 UniProt 序列重新编号的蛋白质数据库文件的网络服务器和程序。

PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.

机构信息

Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russian Federation.

Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania, United States of America.

出版信息

PLoS One. 2021 Jul 6;16(7):e0253411. doi: 10.1371/journal.pone.0253411. eCollection 2021.

Abstract

The Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions, including binding partners such as ligands, nucleic acids, or other proteins; mutations, and post-translational modifications, thus enabling extensive comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. For instance, some authors may include N-terminal signal peptides or the N-terminal methionine in the sequence numbering and others may not. In addition to the coordinates, there are many fields that contain structural and functional information regarding specific residues numbered according to the author. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries or a list of UniProt identifiers (e.g., "P04637" or "P53_HUMAN") and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe.

摘要

蛋白质数据库(PDB)于 1971 年在布鲁克海文国家实验室成立,是生物大分子晶体结构的档案库。2021 年年中,该数据库几乎拥有 18 万个通过 X 射线晶体学、核磁共振、冷冻电子显微镜和其他方法解决的结构。许多蛋白质已经在不同条件下进行了研究,包括结合配偶体,如配体、核酸或其他蛋白质;突变和翻译后修饰,从而能够进行广泛的结构-功能比较研究。然而,由于 PDB 允许作者以他们希望的任何方式对每个蛋白质序列中的氨基酸进行编号,这些研究变得更加困难。这导致相同的蛋白质在可用的 PDB 条目中有不同的编号。例如,一些作者可能会在序列编号中包含 N 端信号肽或 N 端甲硫氨酸,而另一些作者则不会。除了坐标之外,还有许多字段包含根据作者编号的特定残基的结构和功能信息。在这里,我们提供了一个网络服务器和 Python3 应用程序,通过用从相应的 UniProt 序列派生的编号替换作者编号来解决 PDB 序列编号问题。我们从 PDBe 的 SIFTS 数据库中获得了这种对应关系。该服务器和程序可以接受 PDB 条目的列表或 UniProt 标识符的列表(例如,"P04637"或"P53_HUMAN"),并为 PDBe 提供的不对称单元文件和生物组装文件提供以 mmCIF 格式和传统 PDB 格式重命名的文件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce9a/8259974/0a311c086f70/pone.0253411.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验