Rosenberg Aviv A, Marx Ailie, Bronstein Alexander M
Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel.
Department of Molecular and Computational Biosciences and Biotechnology, Migal - Galilee Research Institute, Qiryat, Israel.
Sci Data. 2024 Jul 17;11(1):783. doi: 10.1038/s41597-024-03595-4.
Protein Data Bank (PDB) files list the relative spatial location of atoms in a protein structure as the final output of the process of fitting and refining to experimentally determined electron density measurements. Where experimental evidence exists for multiple conformations, atoms are modelled in alternate locations. Programs reading PDB files commonly ignore these alternate conformations by default leaving users oblivious to the presence of alternate conformations in the structures they analyze. This has led to underappreciation of their prevalence, under characterisation of their features and limited the accessibility to this high-resolution data representing structural ensembles. We have trawled PDB files to extract structural features of residues with alternately located atoms. The output includes the distance between alternate conformations and identifies the location of these segments within the protein chain and in proximity of all other atoms within a defined radius. This dataset should be of use in efforts to predict multiple structures from a single sequence and support studies investigating protein flexibility and the association with protein function.
蛋白质数据库(PDB)文件列出了蛋白质结构中原子的相对空间位置,这是根据实验测定的电子密度测量值进行拟合和精修过程的最终输出。当存在多种构象的实验证据时,原子会在交替位置进行建模。读取PDB文件的程序通常默认忽略这些交替构象,使得用户在分析的结构中对交替构象的存在浑然不觉。这导致对其普遍性认识不足,对其特征描述不够,并且限制了获取这些代表结构集合的高分辨率数据的途径。我们遍历了PDB文件,以提取具有交替定位原子的残基的结构特征。输出结果包括交替构象之间的距离,并确定这些片段在蛋白质链中的位置以及在定义半径内与所有其他原子的接近程度。该数据集应有助于从单个序列预测多种结构,并支持研究蛋白质灵活性及其与蛋白质功能关联的研究。