University of Maryland , Department of Chemical and Biomolecular Engineering , College Park , Maryland 20742 , United States.
Summer Undergraduate Research Fellowship , National Institute of Standards and Technology , Boulder , Colorado 80305 , United States.
J Chem Inf Model. 2019 Feb 25;59(2):931-943. doi: 10.1021/acs.jcim.8b00950. Epub 2019 Feb 15.
Cysteine is a multifaceted amino acid that is central to the structure and function of many proteins. A disulfide bond formed between two cysteines restrains protein conformations through the strong covalent bond and torsions about the bond that prefer, energetically, ±90°. In this study, we transform over 30 000 Protein Databank files (PDBx/mmCIFs) into a single file, the SQLite database (Cys.sqlite). The database schema is designed to accommodate the structural information on both oxidized and reduced cysteines and to retain essential protein metadata to establish informational and biological provenance. Cys.sqlite contains over 95 000 peptide chains and 500 000 cysteines (700 000 structural conformers); there are over 265 000 cysteine disulfide bond conformations from structures solved with all available experimental methods. The structural information is analyzed with respect to sequence identity cutoff, the experimental method, and energetics of the disulfide. We find that as the experimental information becomes limiting and the influence of modeling becomes more pronounced, the observed average strain increases artificially. The database and analyses presented here can be used to improve the refinement of biological structures from experiments that are known to contain one or more disulfide bonds.
半胱氨酸是一种多功能氨基酸,是许多蛋白质的结构和功能的核心。两个半胱氨酸之间形成的二硫键通过强共价键和键的扭转来限制蛋白质构象,扭转在能量上优先于 ±90°。在这项研究中,我们将超过 30000 个蛋白质数据库文件(PDBx/mmCIF)转换为一个单一的文件,SQLite 数据库(Cys.sqlite)。数据库模式旨在容纳氧化和还原半胱氨酸的结构信息,并保留基本的蛋白质元数据,以建立信息和生物学来源。Cys.sqlite 包含超过 95000 条肽链和 500000 个半胱氨酸(700000 个结构构象);有超过 265000 种二硫键构象来自所有可用实验方法解决的结构。结构信息根据序列同一性截止值、实验方法和二硫键的能量进行分析。我们发现,随着实验信息变得有限,建模的影响变得更加明显,观察到的平均应变人为地增加。这里呈现的数据库和分析可以用于改进已知含有一个或多个二硫键的实验的生物结构的精修。