School of Computer Science, University of Windsor, Windsor, Ontario, Canada.
Inst. of Env. Health Sci., Wayne State University, Detroit, MI, USA.
BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):410. doi: 10.1186/s12859-018-2378-9.
The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Computational methods that can accurately identify CaM-binding proteins and CaM-binding domains would accelerate research in calcium signaling and calmodulin function. Short-linear motifs (SLiMs), on the other hand, have been effectively used as features for analyzing protein-protein interactions, though their properties have not been utilized in the prediction of CaM-binding proteins.
We propose a new method for the prediction of CaM-binding proteins based on both the total and average scores of known and new SLiMs in protein sequences using a new scoring method called sliding window scoring (SWS) as features for the prediction module. A dataset of 194 manually curated human CaM-binding proteins and 193 mitochondrial proteins have been obtained and used for testing the proposed model. The motif generation tool, Multiple EM for Motif Elucidation (MEME), has been used to obtain new motifs from each of the positive and negative datasets individually (the SM approach) and from the combined negative and positive datasets (the CM approach). Moreover, the wrapper criterion with random forest for feature selection (FS) has been applied followed by classification using different algorithms such as k-nearest neighbors (k-NN), support vector machines (SVM), naive Bayes (NB) and random forest (RF).
Our proposed method shows very good prediction results and demonstrates how information contained in SLiMs is highly relevant in predicting CaM-binding proteins. Further, three new CaM-binding motifs have been computationally selected and biologically validated in this study, and which can be used for predicting CaM-binding proteins.
钙调蛋白结合(CaM-结合)蛋白的预测在生物学和生物化学领域中起着非常重要的作用,因为钙调蛋白蛋白结合并调节了许多影响不同细胞过程的蛋白质靶标。能够准确识别 CaM-结合蛋白和 CaM-结合结构域的计算方法将加速钙信号和钙调蛋白功能的研究。另一方面,短线性基序(SLiMs)已被有效地用作分析蛋白质-蛋白质相互作用的特征,尽管它们的性质尚未在 CaM-结合蛋白的预测中得到利用。
我们提出了一种新的方法,该方法基于使用新的评分方法称为滑动窗口评分(SWS)作为预测模块的特征,对蛋白质序列中已知和新的 SLiMs 的总评分和平均评分来预测 CaM-结合蛋白。已经获得并使用了一个由 194 个人工编辑的人类 CaM-结合蛋白和 193 个线粒体蛋白组成的数据集来测试所提出的模型。 motif 生成工具,多模体解析的多重 EM(MEME),已被用于从每个阳性和阴性数据集(SM 方法)以及从组合的阴性和阳性数据集(CM 方法)中单独获得新的 motif。此外,已经应用了带有随机森林的包装器准则进行特征选择(FS),然后使用不同的算法(如 k-最近邻(k-NN)、支持向量机(SVM)、朴素贝叶斯(NB)和随机森林(RF))进行分类。
我们提出的方法显示出非常好的预测结果,并证明了 SLiMs 中包含的信息在预测 CaM-结合蛋白方面是非常相关的。此外,在这项研究中,已经计算选择了三个新的 CaM-结合基序,并进行了生物验证,可用于预测 CaM-结合蛋白。