Institute of Technical Biochemistry, University of Stuttgart, Allmandring 31, Stuttgart, 70569, Germany.
BMC Biochem. 2012 Nov 17;13:24. doi: 10.1186/1471-2091-13-24.
Standard numbering schemes for families of homologous proteins allow for the unambiguous identification of functionally and structurally relevant residues, to communicate results on mutations, and to systematically analyse sequence-function relationships in protein families. Standard numbering schemes have been successfully implemented for several protein families, including lactamases and antibodies, whereas a numbering scheme for the structural family of thiamine-diphosphate (ThDP) -dependent decarboxylases, a large subfamily of the class of ThDP-dependent enzymes encompassing pyruvate-, benzoylformate-, 2-oxo acid-, indolpyruvate- and phenylpyruvate decarboxylases, benzaldehyde lyase, acetohydroxyacid synthases and 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexadiene-1-carboxylate synthase (MenD) is still missing.Despite a high structural similarity between the members of the ThDP-dependent decarboxylases, their sequences are diverse and make a pairwise sequence comparison of protein family members difficult.
We developed and validated a standard numbering scheme for the family of ThDP-dependent decarboxylases. A profile hidden Markov model (HMM) was created using a set of representative sequences from the family of ThDP-dependent decarboxylases. The pyruvate decarboxylase from S. cerevisiae (PDB: 2VK8) was chosen as a reference because it is a well characterized enzyme. The crystal structure with the PDB identifier 2VK8 encompasses the structure of the ScPDC mutant E477Q, the cofactors ThDP and Mg(2+) as well as the substrate analogue (2S)-2-hydroxypropanoic acid. The absolute numbering of this reference sequence was transferred to all members of the ThDP-dependent decarboxylase protein family. Subsequently, the numbering scheme was integrated into the already established Thiamine-diphosphate dependent Enzyme Engineering Database (TEED) and was used to systematically analyze functionally and structurally relevant positions in the superfamily of ThDP-dependent decarboxylases.
The numbering scheme serves as a tool for the reliable sequence alignment of ThDP-dependent decarboxylases and the unambiguous identification and communication of corresponding positions. Thus, it is the basis for the systematic and automated analysis of sequence-encoded properties such as structural and functional relevance of amino acid positions, because the analysis of conserved positions, the identification of correlated mutations and the determination of subfamily specific amino acid distributions depend on reliable multisequence alignments and the unambiguous identification of the alignment columns. The method is reliable and robust and can easily be adapted to further protein families.
同源蛋白家族的标准编号方案可明确识别功能和结构相关的残基,用于交流突变结果,并系统分析蛋白质家族的序列-功能关系。已经成功为一些蛋白质家族实施了标准编号方案,包括内酰胺酶和抗体,而对于依赖硫胺素二磷酸 (ThDP) 的脱羧酶结构家族,即依赖 ThDP 的酶的一个大亚家族的编号方案,包括丙酮酸、苯甲酰甲酸、2-氧代酸、吲哚丙酮酸和苯丙酮酸脱羧酶、苯乙醛裂解酶、乙酰羟酸合酶和 2-琥珀酰-5-烯醇丙酮酸-6-羟基-3-环己二烯-1-羧酸合酶 (MenD),仍然缺失。尽管依赖 ThDP 的脱羧酶成员之间具有高度的结构相似性,但它们的序列是多样化的,使得对蛋白质家族成员进行成对的序列比较变得困难。
我们为依赖 ThDP 的脱羧酶家族开发并验证了一种标准编号方案。使用来自依赖 ThDP 的脱羧酶家族的一组代表性序列创建了一个轮廓隐马尔可夫模型 (HMM)。选择酿酒酵母的丙酮酸脱羧酶 (PDB: 2VK8) 作为参考,因为它是一种特征良好的酶。带有 PDB 标识符 2VK8 的晶体结构包含 ScPDC 的 E477Q 突变体、辅助因子 ThDP 和 Mg(2+) 以及底物类似物 (2S)-2-羟基丙酸。该参考序列的绝对编号被转移到依赖 ThDP 的脱羧酶蛋白家族的所有成员。随后,编号方案被整合到已经建立的硫胺素二磷酸依赖酶工程数据库 (TEED) 中,并用于系统地分析依赖 ThDP 的脱羧酶超家族中功能和结构相关的位置。
编号方案是用于可靠对齐依赖 ThDP 的脱羧酶序列以及明确识别和交流相应位置的工具。因此,它是系统和自动分析序列编码属性的基础,例如氨基酸位置的结构和功能相关性,因为保守位置的分析、相关突变的识别和亚家族特定氨基酸分布的确定都依赖于可靠的多序列比对和对齐列的明确识别。该方法可靠且稳健,并且可以轻松适应其他蛋白质家族。