Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, Kingsville, TX, USA.
Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
Comput Biol Chem. 2021 Apr;91:107436. doi: 10.1016/j.compbiolchem.2021.107436. Epub 2021 Jan 27.
The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature. The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination.
蛋白质中二硫键是一种在翻译后修饰过程中通过一对半胱氨酸的氧化形成的共价键。在蛋白质中,二硫键是除肽键之外氨基酸之间最常见的共价连接。它在三维(3D)从头蛋白质结构预测(aiPSP)中起着重要作用,稳定蛋白质构象、翻译后修饰和蛋白质折叠。在 aiPSP 中,二硫键的位置可以通过施加几何约束强烈减少构象空间搜索。现有的确定二硫键的实验技术既耗时又昂贵。因此,开发基于序列的计算方法来预测二硫键变得不可或缺。本研究提出了一种基于堆叠的机器学习方法用于预测二硫键(diSBPred)。为了进行有效的训练,提取了各种有用的序列和结构特征,包括保守性轮廓、残基溶剂可及性、扭转角灵活性、无序概率、半胱氨酸之间的顺序距离等。二硫键的预测分两个阶段进行:首先,单独预测半胱氨酸是键合的还是非键合的;其次,通过将半胱氨酸键合预测的结果作为特征,预测半胱氨酸对是键合的还是非键合的。检查本研究中使用的特征与现有最近邻算法(NNA)方法中使用的特征的相关性表明,本研究中使用的特征在 jackknife 验证平衡准确性方面提高了约 7.39%。此外,对于单个半胱氨酸键合预测和半胱氨酸对键合预测,diSBPred 的 10 倍交叉验证平衡准确性分别为 82.29%和 94.20%。总的来说,与现有的基于 NNA 的方法相比,我们的预测器在平衡准确性方面提高了 43.25%。因此,diSBPred 可以用于注释结构未知的蛋白质序列中半胱氨酸键合残基,提高 aiPSP 方法的准确性,进一步辅助二硫键和结构测定的实验研究。