Department of Biology & Bioinformatics Research Group, University of Qom, Qom, Iran.
PLoS One. 2011;6(8):e23146. doi: 10.1371/journal.pone.0023146. Epub 2011 Aug 10.
The engineering of thermostable enzymes is receiving increased attention. The paper, detergent, and biofuel industries, in particular, seek to use environmentally friendly enzymes instead of toxic chlorine chemicals. Enzymes typically function at temperatures below 60°C and denature if exposed to higher temperatures. In contrast, a small portion of enzymes can withstand higher temperatures as a result of various structural adaptations. Understanding the protein attributes that are involved in this adaptation is the first step toward engineering thermostable enzymes. We employed various supervised and unsupervised machine learning algorithms as well as attribute weighting approaches to find amino acid composition attributes that contribute to enzyme thermostability. Specifically, we compared two groups of enzymes: mesostable and thermostable enzymes. Furthermore, a combination of attribute weighting with supervised and unsupervised clustering algorithms was used for prediction and modelling of protein thermostability from amino acid composition properties. Mining a large number of protein sequences (2090) through a variety of machine learning algorithms, which were based on the analysis of more than 800 amino acid attributes, increased the accuracy of this study. Moreover, these models were successful in predicting thermostability from the primary structure of proteins. The results showed that expectation maximization clustering in combination with uncertainly and correlation attribute weighting algorithms can effectively (100%) classify thermostable and mesostable proteins. Seventy per cent of the weighting methods selected Gln content and frequency of hydrophilic residues as the most important protein attributes. On the dipeptide level, the frequency of Asn-Glu was the key factor in distinguishing mesostable from thermostable enzymes. This study demonstrates the feasibility of predicting thermostability irrespective of sequence similarity and will serve as a basis for engineering thermostable enzymes in the laboratory.
耐热酶的工程研究受到了越来越多的关注。特别是造纸、洗涤剂和生物燃料等行业,都希望用环保型酶替代有毒的氯化学物质。通常,酶在 60°C 以下的温度下发挥作用,如果暴露在更高的温度下,酶就会变性。相比之下,一小部分酶由于各种结构上的适应,能够承受更高的温度。了解参与这种适应的蛋白质属性是工程耐热酶的第一步。我们采用了各种有监督和无监督的机器学习算法以及属性加权方法,来寻找对酶耐热性有贡献的氨基酸组成属性。具体来说,我们比较了两组酶:中稳酶和耐热酶。此外,还结合属性加权和有监督和无监督聚类算法,从氨基酸组成特性预测和建模蛋白质耐热性。通过各种机器学习算法挖掘大量蛋白质序列(2090 个),这些算法基于对 800 多个氨基酸属性的分析,提高了本研究的准确性。此外,这些模型成功地从蛋白质的一级结构预测了耐热性。结果表明,期望最大化聚类与不确定性和相关性属性加权算法相结合,可以有效地(100%)对耐热酶和中稳酶进行分类。70%的加权方法选择谷氨酰胺含量和亲水性残基的频率作为最重要的蛋白质属性。在二肽水平上,Asn-Glu 的频率是区分中稳酶和耐热酶的关键因素。本研究证明了在不考虑序列相似性的情况下预测耐热性的可行性,将为实验室中工程耐热酶的设计提供依据。