Jadeau Fanny, Bechet Emmanuelle, Cozzone Alain J, Deléage Gilbert, Grangeasse Christophe, Combet Christophe
Institut de Biologie et Chimie des Protéines; UMR5086, CNRS, Université Lyon 1, IFR128 BioSciences Lyon-Gerland, 7, passage du Vercors, 69367 Lyon CEDEX 07, France.
Bioinformatics. 2008 Nov 1;24(21):2427-30. doi: 10.1093/bioinformatics/btn462. Epub 2008 Sep 3.
Most of the protein tyrosine kinases found in bacteria have been recently classified in a new family, termed BY-kinase. Indeed, they share no sequence homology with their eukaryotic counterparts and have no known eukaryotic homologues. They are involved in several biological functions (e.g. capsule biosynthesis, antibiotic resistance, virulence mechanism). Thus, they can be considered interesting therapeutic targets to develop new drugs to treat infectious diseases. However, their identification is rendered difficult due to slow progress in their structural characterization and comes most often from biochemical experiments. Moreover BY-kinase sequences are related to many other bacterial proteins involved in several biological functions (e.g. ParA family proteins). Accordingly, their annotations in generalist databases, sequence analysis and classification remain partial and inhomogeneous and there is no bioinformatics resource dedicated to these proteins.
The combination of similarity search with sequence-profile alignment, pattern matching and sliding window computation to detect the tyrosine cluster was used to identify BY-kinase sequences in UniProt Knowledgebase. Cross-validations with keywords searches, pattern matching with several patterns and checking of motifs conservation in multiple sequence alignments were performed. Our pipeline identified 640 sequences as BY-kinases and allowed the definition of a PROSITE pattern that is the signature of the BY-kinases. The sequences identified by our pipeline as BY-kinases share a good sequence similarity with BY-kinases that have already been biochemically characterized, and they all bear the characteristic motifs of the catalytic domain, including the three Walker-like motifs followed by a tyrosine cluster.
细菌中发现的大多数蛋白质酪氨酸激酶最近被归类为一个新家族,称为BY激酶。实际上,它们与真核生物中的对应物没有序列同源性,也没有已知的真核生物同源物。它们参与多种生物学功能(如荚膜生物合成、抗生素抗性、毒力机制)。因此,它们可被视为开发治疗传染病新药的有趣治疗靶点。然而,由于其结构表征进展缓慢,其鉴定变得困难,且大多来自生化实验。此外,BY激酶序列与许多参与多种生物学功能的其他细菌蛋白质相关(如ParA家族蛋白质)。因此,它们在通用数据库中的注释、序列分析和分类仍然不完整且不均匀,并且没有专门针对这些蛋白质的生物信息学资源。
将相似性搜索与序列轮廓比对、模式匹配和滑动窗口计算相结合以检测酪氨酸簇,用于在UniProt知识库中识别BY激酶序列。进行了关键词搜索交叉验证、与多种模式的模式匹配以及在多序列比对中检查基序保守性。我们的流程识别出640个序列为BY激酶,并定义了一个PROSITE模式,该模式是BY激酶的特征。我们的流程识别为BY激酶的序列与已通过生化表征的BY激酶具有良好的序列相似性,并且它们都具有催化结构域的特征基序,包括三个类似沃克的基序,随后是一个酪氨酸簇。