Department of Biophysics, School of Medicine, Marmara University, Istanbul, Turkey.
Protein J. 2020 Feb;39(1):21-32. doi: 10.1007/s10930-020-09880-6.
A class of secondary structure prediction algorithms use the information from the statistics of the residue pairs found in secondary structural elements. Because the protein folding process is dominated by backbone hydrogen bonding, an approach based on backbone hydrogen-bonded residue pairings would improve the predicting capabilities of these class algorithms. The reliability of the prediction algorithms depends on the quality of the statistics, therefore, of the data set. In this study, it was aimed to determine the propensities of the backbone hydrogen-bonded residue pairings for secondary structural elements of α-helix and β-sheet in globular proteins using a new and comprehensive data set created from the peptides deposited in Worldwide Protein Data Bank. A master data set including 4882 globular peptide chains with resolution better than 2.5 Å, sequence identity smaller than 25% and length of no shorter than 100 residues were created. Separate data sub sets also were created for helix and sheet structures from master set and each sub set includes 4594 and 4483 chains, respectively. Backbone hydrogen-bonded residue pairings in helices and sheets were detected and the propensities of them were represented as odds ratios (observed/[random or expected]) in matrices. Propensities assigned by this study to the residue pairings in secondary structural elements (as helix, overall strands, parallel strands and antiparallel strands) differ from the previous studies by 19 to 34%. These dissimilarities are important and they would cause further improvements in secondary structure prediction algorithms.
一类二级结构预测算法利用残基对在二级结构元件中发现的统计信息。由于蛋白质折叠过程主要由骨架氢键决定,因此基于骨架氢键残基对的方法将提高这些类算法的预测能力。预测算法的可靠性取决于统计数据的质量,因此,数据集的质量。在这项研究中,目的是使用从世界范围内蛋白质数据库中存储的肽创建的新的和全面的数据集来确定α-螺旋和β-折叠中骨架氢键残基对球状蛋白质二级结构元件的倾向。创建了一个主数据集,其中包括 4882 条分辨率优于 2.5 Å、序列同一性小于 25%且长度不小于 100 个残基的球状肽链。还从主集中创建了单独的用于螺旋和片层结构的数据子集中,每个子集中分别包含 4594 条和 4483 条链。检测了螺旋和片层中的骨架氢键残基对,并将它们的倾向表示为odds 比(观察值/[随机值或期望值])矩阵。本研究分配给二级结构元件(如螺旋、整体链、平行链和反平行链)中残基对的倾向性与之前的研究相比差异为 19%至 34%。这些差异很重要,它们将导致二级结构预测算法的进一步改进。