Zhou Guo-Ping, Cai Yu-Dong
Center for Vascular Biology Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts 02115, USA.
Proteins. 2006 May 15;63(3):681-4. doi: 10.1002/prot.20898.
Proteases play a vitally important role in regulating most physiological processes. Different types of proteases perform different functions with different biological processes. Therefore, it is highly desired to develop a fast and reliable means to identify the types of proteases according to their sequences, or even just identify whether they are proteases or nonproteases. The avalanche of protein sequences generated in the postgenomic era has made such a challenge become even more critical and urgent. By hybridizing the gene ontology approach and pseudo amino acid composition approach, a powerful predictor called GO-PseAA predictor was introduced to address the problems. To avoid redundancy and bias, demonstrations were performed on a dataset where none of proteins has >/= 25% sequence identity to any other. The overall success rates thus obtained by the jackknife cross-validation test in identifying protease and nonprotease was 91.82%, and that in identifying the protease type was 85.49% among the following five types: (1) aspartic, (2) cysteine, (3) metallo, (4) serine, and (5) threonine. The high jackknife success rates yielded for such a stringent dataset indicate the GO-PseAA predictor is very powerful and might become a useful tool in bioinformatics and proteomics.
蛋白酶在调节大多数生理过程中起着至关重要的作用。不同类型的蛋白酶在不同的生物过程中发挥不同的功能。因此,迫切需要开发一种快速可靠的方法,根据蛋白酶的序列来识别其类型,甚至仅仅识别它们是蛋白酶还是非蛋白酶。后基因组时代产生的大量蛋白质序列使得这一挑战变得更加关键和紧迫。通过将基因本体方法和伪氨基酸组成方法相结合,引入了一种强大的预测器——GO-PseAA预测器来解决这些问题。为了避免冗余和偏差,在一个数据集中进行了验证,该数据集中没有任何蛋白质与其他蛋白质的序列同一性≥25%。通过留一法交叉验证测试在识别蛋白酶和非蛋白酶方面获得的总体成功率为91.82%,在识别以下五种类型的蛋白酶类型方面的成功率为85.49%:(1)天冬氨酸蛋白酶,(2)半胱氨酸蛋白酶,(3)金属蛋白酶,(4)丝氨酸蛋白酶,(5)苏氨酸蛋白酶。对于这样一个严格的数据集获得的高留一法成功率表明,GO-PseAA预测器非常强大,可能会成为生物信息学和蛋白质组学中的一个有用工具。