López Yosvany, Dehzangi Abdollah, Reddy Hamendra Manhar, Sharma Alok
Genesis Institute of Genetic Research, Genesis Healthcare Co., Tokyo, Japan.
Department of Computer Science, Morgan State University, Baltimore, Maryland, USA.
Comput Biol Chem. 2020 Aug;87:107235. doi: 10.1016/j.compbiolchem.2020.107235. Epub 2020 Feb 19.
Post-translational modifications are considered important molecular interactions in protein science. One of these modifications is "sumoylation" whose computational detection has recently become a challenge. In this paper, we propose a new computational predictor which makes use of the sine and cosine of backbone torsion angles and the accessible surface area for predicting sumoylation sites. The aforementioned features were computed for all the proteins in our benchmark dataset, and a training matrix consisting of sumoylation and non-sumoylation sites was ultimately created. This training matrix was balanced by undersampling the majority class (non-sumoylation sites) using the NearMiss method. Finally, an AdaBoost classifier was used for discriminating between sumoylation and non-sumoylation sites. Our predictor was called "C-iSumo" because of its effective use of circular functions. C-iSumo was compared with another predictor which was outperformed in statistical metrics such as sensitivity (0.734), accuracy (0.746) and Matthews correlation coefficient (0.494).
翻译后修饰在蛋白质科学中被认为是重要的分子相互作用。其中一种修饰是“类泛素化”,其计算检测最近已成为一项挑战。在本文中,我们提出了一种新的计算预测器,它利用主链扭转角的正弦和余弦以及可及表面积来预测类泛素化位点。为我们基准数据集中的所有蛋白质计算了上述特征,并最终创建了一个由类泛素化位点和非类泛素化位点组成的训练矩阵。使用NearMiss方法对多数类(非类泛素化位点)进行欠采样,从而平衡了这个训练矩阵。最后,使用AdaBoost分类器来区分类泛素化位点和非类泛素化位点。由于有效使用了三角函数,我们的预测器被称为“C-iSumo”。将C-iSumo与另一个预测器进行了比较,在诸如灵敏度(0.734)、准确率(0.746)和马修斯相关系数(0.494)等统计指标方面,C-iSumo表现更优。