Suppr超能文献

AFP-Pred:一种基于序列衍生特性预测抗冻蛋白的随机森林方法。

AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties.

机构信息

Institute for Neuro- and Bioinformatics, University of Lübeck, 23538 Lübeck, Germany.

出版信息

J Theor Biol. 2011 Feb 7;270(1):56-62. doi: 10.1016/j.jtbi.2010.10.037. Epub 2010 Nov 4.

Abstract

Some creatures living in extremely low temperatures can produce some special materials called "antifreeze proteins" (AFPs), which can prevent the cell and body fluids from freezing. AFPs are present in vertebrates, invertebrates, plants, bacteria, fungi, etc. Although AFPs have a common function, they show a high degree of diversity in sequences and structures. Therefore, sequence similarity based search methods often fails to predict AFPs from sequence databases. In this work, we report a random forest approach "AFP-Pred" for the prediction of antifreeze proteins from protein sequence. AFP-Pred was trained on the dataset containing 300 AFPs and 300 non-AFPs and tested on the dataset containing 181 AFPs and 9193 non-AFPs. AFP-Pred achieved 81.33% accuracy from training and 83.38% from testing. The performance of AFP-Pred was compared with BLAST and HMM. High prediction accuracy and successful of prediction of hypothetical proteins suggests that AFP-Pred can be a useful approach to identify antifreeze proteins from sequence information, irrespective of their sequence similarity.

摘要

一些生活在极低温度下的生物可以产生一些特殊的材料,称为“抗冻蛋白”(AFPs),可以防止细胞和体液结冰。AFPs 存在于脊椎动物、无脊椎动物、植物、细菌、真菌等中。尽管 AFPs 具有共同的功能,但它们在序列和结构上表现出高度的多样性。因此,基于序列相似性的搜索方法通常无法从序列数据库中预测 AFP。在这项工作中,我们报告了一种基于随机森林的方法“AFP-Pred”,用于从蛋白质序列预测抗冻蛋白。AFP-Pred 是在包含 300 个 AFP 和 300 个非 AFP 的数据集上进行训练的,并在包含 181 个 AFP 和 9193 个非 AFP 的数据集上进行了测试。AFP-Pred 在训练时达到了 81.33%的准确率,在测试时达到了 83.38%的准确率。与 BLAST 和 HMM 相比,AFP-Pred 的性能。高预测准确性和成功预测假设蛋白表明,AFP-Pred 可以成为一种从序列信息中识别抗冻蛋白的有用方法,而无需考虑它们的序列相似性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验