Zhang Guangya, Li Hongchun, Gao Jiaqiang, Fang Baishan
Department of Biotechnology and Bioengineering, Huaqiao University, Xiamen 361021, China.
Sheng Wu Gong Cheng Xue Bao. 2008 Nov;24(11):1968-74.
Lipases are widely used enzymes in biotechnology. Although they catalyze the same reaction, their sequences vary. Therefore, it is highly desired to develop a fast and reliable method to identify the types of lipases according to their sequences, or even just to confirm whether they are lipases or not. By proposing two scales based pseudo amino acid composition approaches to extract the features of the sequences, a powerful predictor based on k-nearest neighbor was introduced to address the problems. The overall success rates thus obtained by the 10-fold cross-validation test were shown as below: for predicting lipases and nonlipase, the success rates were 92.8%, 91.4% and 91.3%, respectively. For lipase types, the success rates were 92.3%, 90.3% and 89.7%, respectively. Among them, the Z scales based pseudo amino acid composition was the best, T scales was the second. They outperformed significantly than 6 other frequently used sequence feature extraction methods. The high success rates yielded for such a stringent dataset indicate predicting the types of lipases is feasible and the different scales pseudo amino acid composition might be a useful tool for extracting the features of protein sequences, or at lease can play a complementary role to many of the other existing approaches.
脂肪酶是生物技术中广泛使用的酶。尽管它们催化相同的反应,但其序列各不相同。因此,迫切需要开发一种快速可靠的方法,根据脂肪酶的序列来识别其类型,甚至只是确认它们是否为脂肪酶。通过提出两种基于尺度的伪氨基酸组成方法来提取序列特征,引入了一种基于k近邻的强大预测器来解决这些问题。通过10折交叉验证测试获得的总体成功率如下:对于预测脂肪酶和非脂肪酶,成功率分别为92.8%、91.4%和91.3%。对于脂肪酶类型,成功率分别为92.3%、90.3%和89.7%。其中,基于Z尺度的伪氨基酸组成是最好的,T尺度次之。它们的表现明显优于其他6种常用的序列特征提取方法。对于如此严格的数据集所产生的高成功率表明,预测脂肪酶的类型是可行的,不同尺度的伪氨基酸组成可能是提取蛋白质序列特征的有用工具,或者至少可以与许多其他现有方法起到互补作用。