Miyata Ryosuke, Moriwaki Yoshitaka, Terada Tohru, Shimizu Kentaro
Department of Biotechnology, The University of Tokyo, 1-1-1 yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
Heliyon. 2021 Sep 8;7(9):e07953. doi: 10.1016/j.heliyon.2021.e07953. eCollection 2021 Sep.
Antifreeze proteins (AFPs) are proteins that protect cellular fluids and body fluids from freezing by inhibiting the nucleation and growth of ice crystals and preventing ice recrystallization, thereby contributing to the maintenance of life in living organisms. They exist in fish, insects, microorganisms, and fungi. However, the number of known AFPs is currently limited, and it is essential to construct a reliable dataset of AFPs and develop a bioinformatics tool to predict AFPs. In this work, we first collected AFPs sequences from UniProtKB considering the reliability of annotations and, based on these datasets, developed a prediction system using random forest. We achieved accuracies of 0.961 and 0.947 for non-redundant sequences with less than 90% and 30% identities and achieved the accuracy of 0.953 for representative sequences for each species. Using the ability of random forest, we identified the sequence features that contributed to the prediction. Some sequence features were common to AFPs from different species. These features include the Cys content, Ala-Ala content, Trp-Gly content, and the amino acids' distribution related to the disorder propensity. The computer program and the dataset developed in this work are available from the GitHub site: https://github.com/ryomiya/Prediction-and-analysis-of-antifreeze-proteins.
抗冻蛋白(AFPs)是一类通过抑制冰晶的成核和生长以及防止冰再结晶来保护细胞液和体液不被冻结的蛋白质,从而有助于维持生物体的生命活动。它们存在于鱼类、昆虫、微生物和真菌中。然而,目前已知的抗冻蛋白数量有限,构建一个可靠的抗冻蛋白数据集并开发一种生物信息学工具来预测抗冻蛋白至关重要。在这项工作中,我们首先从UniProtKB收集抗冻蛋白序列,并考虑注释的可靠性,基于这些数据集,使用随机森林开发了一个预测系统。对于同一性低于90%和30%的非冗余序列,我们分别达到了0.961和0.947的准确率,对于每个物种的代表性序列,准确率达到了0.953。利用随机森林的能力,我们确定了有助于预测的序列特征。一些序列特征在来自不同物种的抗冻蛋白中是常见的。这些特征包括半胱氨酸含量、丙氨酸-丙氨酸含量、色氨酸-甘氨酸含量以及与无序倾向相关的氨基酸分布。这项工作中开发的计算机程序和数据集可从GitHub网站获取:https://github.com/ryomiya/Prediction-and-analysis-of-antifreeze-proteins 。