Miranda-Saavedra Diego, Barton Geoffrey J
School of Life Sciences Research, University of Dundee, Dow Street, Dundee DD1 5EH, Scotland, UK.
Proteins. 2007 Sep 1;68(4):893-914. doi: 10.1002/prot.21444.
Reversible protein phosphorylation by protein kinases and phosphatases is a ubiquitous signaling mechanism in all eukaryotic cells. A multilevel hidden Markov model library is presented which is able to classify protein kinases into one of 12 families, with a misclassification rate of zero on the characterized kinomes of H. sapiens, M. musculus, D. melanogaster, C. elegans, S. cerevisiae, D. discoideum, and P. falciparum. The Library is shown to outperform BLASTP and a general Pfam hidden Markov model of the kinase catalytic domain in the retrieval and family-level classification of protein kinases. The application of the Library to the 38 unclassified kinases of yeast enriches the yeast kinome in protein kinases of the families AGC (5), CAMK (17), CMGC (4), and STE (1), thereby raising the family-level classification of yeast conventional protein kinases from 66.96 to 90.43%. The application of the Library to 21 eukaryotic genomes shows seven families (AGC, CAMK, CK1, CMGC, STE, PIKK, and RIO) to be present in all genomes analyzed, and so is likely to be essential to eukaryotes. Putative tyrosine kinases (TKs) are found in the plants A. thaliana (2), O. sativa ssp. Indica (6), and O. sativa ssp. Japonica (7), and in the amoeba E. histolytica (7). To our knowledge, TKs have not been predicted in plants before. This also suggests that a primitive set of TKs might have predated the radiation of eukaryotes. Putative tyrosine kinase-like kinases (TKLs) are found in the fungi C. neoformans (2), P. chrysosporium (4), in the Apicomplexans C. hominis (4), P. yoelii (4), and P. falciparum (6), the amoeba E. histolytica (109), and the alga T. pseudonana (6). TKLs are found to be abundant in plants (776 in A. thaliana, 1010 in O. sativa ssp. Indica, and 969 in O. sativa ssp. Japonica). TKLs might have predated the radiation of eukaryotes too and have been lost secondarily from some fungi. The application of the Library facilitates the annotation of kinomes and has provided novel insights on the early evolution and subsequent adaptations of the various protein kinase families in eukaryotes.
蛋白激酶和磷酸酶介导的可逆蛋白磷酸化是所有真核细胞中普遍存在的信号传导机制。本文提出了一种多级隐马尔可夫模型库,该模型库能够将蛋白激酶分为12个家族之一,在对智人、小家鼠、黑腹果蝇、秀丽隐杆线虫、酿酒酵母、盘基网柄菌和恶性疟原虫等已鉴定的激酶组进行分类时,错误分类率为零。结果表明,在蛋白激酶的检索和家族水平分类方面,该模型库优于BLASTP和激酶催化结构域的通用Pfam隐马尔可夫模型。将该模型库应用于酵母的38个未分类激酶,丰富了酵母激酶组中AGC(5个)、CAMK(17个)、CMGC(4个)和STE(1个)家族的蛋白激酶,从而将酵母传统蛋白激酶的家族水平分类从66.96%提高到90.43%。将该模型库应用于21个真核生物基因组,结果显示7个家族(AGC、CAMK、CK1、CMGC、STE、PIKK和RIO)存在于所有分析的基因组中,因此可能对真核生物至关重要。在植物拟南芥(2个)、水稻亚种印度稻(6个)和水稻亚种日本稻(7个)以及变形虫溶组织内阿米巴(7个)中发现了推定的酪氨酸激酶(TK)。据我们所知,此前尚未在植物中预测到TK。这也表明,一组原始的TK可能在真核生物辐射之前就已存在。在真菌新生隐球菌(2个)、黄孢原毛平革菌(4个)、顶复门寄生虫人隐孢子虫(4个)、约氏疟原虫(4个)和恶性疟原虫(6个)、变形虫溶组织内阿米巴(109个)以及藻类三角褐指藻(6个)中发现了推定的类酪氨酸激酶(TKL)。发现TKL在植物中大量存在(拟南芥中有776个,水稻亚种印度稻中有1010个,水稻亚种日本稻中有969个)。TKL可能也在真核生物辐射之前就已存在,并在一些真菌中次生丢失。该模型库的应用有助于激酶组的注释,并为真核生物中各种蛋白激酶家族的早期进化和后续适应性提供了新的见解。