Department of Computer Science, University of New Orleans, 2000 Lakeshore Dr, New Orleans, LA, 70148, USA.
Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, 700 University Blvd, Kingsville, TX, 78363, USA.
Carbohydr Res. 2019 Dec 1;486:107857. doi: 10.1016/j.carres.2019.107857. Epub 2019 Oct 24.
Carbohydrate-binding proteins play vital roles in many important biological processes. The study of these protein-carbohydrate interactions, at residue level, is useful in treating many critical diseases. Analyzing the local sequential environments of the binding and non-binding regions to predict the protein-carbohydrate binding sites is one of the challenging problems in molecular and computational biology. Existing experimental methods for identifying protein-carbohydrate binding sites are laborious and expensive. Thus, prediction of such binding sites, directly from sequences, using computational methods, can be useful to fast annotate the binding sites and guide the experimental process. Because the number of carbohydrate-binding residues is significantly lower than the number of non-carbohydrate-binding residues, most of the methods developed for the prediction of protein-carbohydrate binding sites are biased towards over predicting the negative class (or non-carbohydrate-binding). Here, we propose a balanced predictor, called StackCBPred, which utilizes features, extracted from evolution-driven sequence profile, called the position-specific scoring matrix (PSSM) and several predicted structural properties of amino acids to effectively train a Stacking-based machine learning method for the accurate prediction of protein-carbohydrate binding sites (https://bmll.cs.uno.edu/).
碳水化合物结合蛋白在许多重要的生物过程中起着至关重要的作用。研究这些蛋白质-碳水化合物的相互作用,在残基水平上,对于治疗许多严重疾病是有用的。分析结合和非结合区域的局部序列环境,以预测蛋白质-碳水化合物结合位点,是分子和计算生物学中的一个具有挑战性的问题。现有的识别蛋白质-碳水化合物结合位点的实验方法既费力又昂贵。因此,使用计算方法直接从序列预测这些结合位点,可以快速注释结合位点并指导实验过程。由于碳水化合物结合残基的数量明显低于非碳水化合物结合残基的数量,因此,为预测蛋白质-碳水化合物结合位点而开发的大多数方法都偏向于过度预测负类(或非碳水化合物结合)。在这里,我们提出了一个平衡的预测器,称为 StackCBPred,它利用从进化驱动的序列谱中提取的特征,称为位置特异性评分矩阵(PSSM)和几种预测的氨基酸结构性质,有效地训练基于堆叠的机器学习方法,以准确预测蛋白质-碳水化合物结合位点(https://bmll.cs.uno.edu/)。