School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
Comput Biol Med. 2021 Dec;139:105006. doi: 10.1016/j.compbiomed.2021.105006. Epub 2021 Nov 2.
In extremely cold environments, living organisms like plants, animals, fishes, and microbes can die due to the intracellular ice formation in their bodies. To sustain life in such cold environments, some cold-blooded species produced Antifreeze proteins (AFPs), also called ice-binding proteins. AFPs are not only limited to the medical field but also have diverse significance in the area of biotechnology, agriculture, and the food industry. Different AFPs exhibit high heterogeneity in their structures and sequences. Keeping the significance of AFPs, several machine-learning-based models have been developed by scientists for the prediction of AFPs. However, due to the complex and diverse nature of AFPs, the prediction performance of the existing methods is limited. Therefore, it is highly indispensable for researchers to develop a reliable computational model that can accurately predict AFPs. In this connection, this study presents a novel predictor for AFPs, named AFP-CMBPred. The sequences of AFPs are formulated via four different feature representation methods, such as Amphiphilic pseudo amino acid composition (Amp-PseAAC), Dipeptide Deviation from Expected Mean (DDE), Multi-Blocks Position Specific Scoring Matrix (MB-PSSM), and Consensus Sequence-based on Multi-Blocks Position Specific Scoring Matrix (CS-MB-PSSM) to collect local and global descriptors. In the next step, the extracted feature vectors are evaluated via Support Vector Machine (SVM) and Random Forest (RF) based classification learners. The prediction performance of both classifiers is further assessed using three validation methods i.e., jackknife test, 10-fold cross-validation test, and independent test. After examining the prediction rates of all validation tests, it was found that our proposed model achieved the higher prediction accuracies of ∼2.65%, ∼2.84%, and ∼3.37% using jackknife, K-fold, and independent test, respectively. The experimental outcomes validate that our proposed "AFP-CMBPred" predictor secured the highest prediction results than the existing models for the identification of AFPs. It is further anticipated that our proposed AFP-CMBPred model will be considered a valuable tool in the research academia and drug development.
在极冷的环境中,植物、动物、鱼类和微生物等生物体会因体内细胞内冰的形成而死亡。为了在这种寒冷的环境中维持生命,一些冷血物种产生了抗冻蛋白(AFP),也称为冰结合蛋白。AFP 不仅限于医学领域,在生物技术、农业和食品工业领域也具有多种意义。不同的 AFP 在结构和序列上表现出高度的异质性。鉴于 AFP 的重要性,科学家们已经开发了几种基于机器学习的模型来预测 AFP。然而,由于 AFP 的复杂性和多样性,现有方法的预测性能受到限制。因此,研究人员开发一种能够准确预测 AFP 的可靠计算模型是非常必要的。在这方面,本研究提出了一种新的 AFP 预测器,命名为 AFP-CMBPred。通过四种不同的特征表示方法(如两亲性伪氨基酸组成(Amp-PseAAC)、二肽偏离预期均值(DDE)、多块位置特异性评分矩阵(MB-PSSM)和基于多块位置特异性评分矩阵的共识序列(CS-MB-PSSM))来制定 AFP 的序列,以收集局部和全局描述符。下一步,通过支持向量机(SVM)和随机森林(RF)分类器对提取的特征向量进行评估。然后使用三种验证方法(即折刀测试、10 折交叉验证测试和独立测试)进一步评估这两种分类器的预测性能。在检查了所有验证测试的预测率后,发现我们提出的模型在使用折刀、K 折和独立测试时分别获得了约 2.65%、2.84%和 3.37%的更高预测准确率。实验结果验证了我们提出的“AFP-CMBPred”预测器在识别 AFP 方面比现有模型获得了更高的预测结果。进一步预计,我们提出的 AFP-CMBPred 模型将成为研究学术界和药物开发的有价值工具。