Suppr超能文献

使用遗传算法进行嗜热蛋白鉴定时对一级序列特征的最优子集选择

Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification.

作者信息

Wang LiQiang, Li CuiFeng

机构信息

Department of Biochemistry and Molecular Biology, College of Life Science, Nankai University, Weijin Road 94, Tianjin, 300071, China,

出版信息

Biotechnol Lett. 2014 Oct;36(10):1963-9. doi: 10.1007/s10529-014-1577-3. Epub 2014 Jun 15.

Abstract

A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.

摘要

将遗传算法(GA)与多元线性回归(MLR)相结合,用于从氨基酸和g-间隔二肽中提取有用特征,以区分嗜热蛋白和非嗜热蛋白。该方法通过包含915个嗜热蛋白和793个非嗜热蛋白的基准数据集进行训练。在使用9种氨基酸、38个0-间隔二肽和29个1-间隔二肽的留一法测试中,该方法的总体准确率达到了95.4%。准确率作为蛋白质大小的函数,范围在85.8%至96.9%之间。三项独立测试的总体准确率分别为93%、93.4%和91.8%。检测嗜热蛋白的观察结果表明,本文所述的GA-MLR方法应该是一种强大的方法,用于选择描述热稳定机制的特征,并有助于设计更稳定的蛋白质。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验