Department of Computer Engineering, Chosun University, Gwangju, 61452, Republic of Korea.
Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
Sci Rep. 2020 Apr 28;10(1):7197. doi: 10.1038/s41598-020-63259-2.
Species living in extremely cold environments resist the freezing conditions through antifreeze proteins (AFPs). Apart from being essential proteins for various organisms living in sub-zero temperatures, AFPs have numerous applications in different industries. They possess very small resemblance to each other and cannot be easily identified using simple search algorithms such as BLAST and PSI-BLAST. Diverse AFPs found in fishes (Type I, II, III, IV and antifreeze glycoproteins (AFGPs)), are sub-types and show low sequence and structural similarity, making their accurate prediction challenging. Although several machine-learning methods have been proposed for the classification of AFPs, prediction methods that have greater reliability are required. In this paper, we propose a novel machine-learning-based approach for the prediction of AFP sequences using latent space learning through a deep auto-encoder method. For latent space pruning, we use the output of the auto-encoder with a deep neural network classifier to learn the non-linear mapping of the protein sequence descriptor and class label. The proposed method outperformed the existing methods, yielding excellent results in comparison. A comprehensive ablation study is performed, and the proposed method is evaluated in terms of widely used performance measures. In particular, the proposed method demonstrated a high Matthews correlation coefficient of 0.52, F-score of 0.49, and Youden's index of 0.81 on an independent test dataset, thereby outperforming the existing methods for AFP prediction.
生活在极寒环境中的物种通过抗冻蛋白(AFP)来抵抗冷冻条件。除了是生活在零度以下温度的各种生物体的必需蛋白外,AFP 在不同行业也有许多应用。它们彼此之间非常相似,无法使用 BLAST 和 PSI-BLAST 等简单的搜索算法轻易识别。在鱼类中发现的不同 AFP(I 型、II 型、III 型、IV 型和抗冻糖蛋白(AFGPs))是亚型,表现出低序列和结构相似性,使得它们的准确预测具有挑战性。尽管已经提出了几种用于 AFP 分类的机器学习方法,但需要更可靠的预测方法。在本文中,我们提出了一种新的基于机器学习的 AFP 序列预测方法,该方法通过深度自动编码器方法使用潜在空间学习。对于潜在空间修剪,我们使用自动编码器的输出与深度神经网络分类器一起学习蛋白质序列描述符和类标签的非线性映射。所提出的方法优于现有方法,取得了优异的比较结果。进行了全面的消融研究,并根据广泛使用的性能指标对所提出的方法进行了评估。特别是,所提出的方法在独立测试数据集上表现出高 0.52 的马修斯相关系数、0.49 的 F 分数和 0.81 的约登指数,从而优于现有的 AFP 预测方法。