Gille Christoph, Goede Andrean, Schlöetelburg Cord, Preissner Robert, Kloetzel Peter Michael, Göbel Ulf B, Frömmel Cornelius
Institute of Biochemistry, Medical Faculty Charité, Humboldt-University, D-10117, Berlin, Germany.
J Mol Biol. 2003 Mar 7;326(5):1437-48. doi: 10.1016/s0022-2836(02)01470-5.
Proteasomes are large multimeric self-compartmentizing proteases, which play a crucial role in the clearance of misfolded proteins, breakdown of regulatory proteins, processing of proteins by specific partial proteolysis, cell cycle control as well as preparation of peptides for immune presentation. Two main types can be distinguished by their different tertiary structure: the 20S proteasome and the proteasome-like heat shock protein encoded by heat shock locus V, hslV. Usually, each biological kingdom is characterized by its specific type of proteasome. The 20S proteasomes occur in eukarya and archaea whereas hslV protease is prevalent in bacteria. To verify this rule we applied a genome-wide sequence search to identify proteasomal sequences in data of finished and yet unfinished genome projects. We found several exceptions to this paradigm: (1) Protista: in addition to the 20S proteasome, Leishmania, Trypanosoma and Plasmodium contained hslV, which may have been acquired from an alpha-proteobacterial progenitor of mitochondria. (2) Bacteria: for Magnetospirillum magnetotacticum and Enterococcus faecium we found that each contained two distinct hslVs due to gene duplication or horizontal transfer. Including unassembled data into the analyses we confirmed that a number of bacterial genomes do not contain any proteasomal sequence due to gene loss. (3) High G+C Gram-positives: we confirmed that high G+C Gram-positives possess 20S proteasomes rather than hslV proteases. The core of the 20S proteasome consists of two distinct main types of homologous monomers, alpha and beta, which differentiated into seven subtypes by further gene duplications. By looking at the genome of the intracellular pathogen Encephalitozoon cuniculi we were able to show that differentiation of beta-type subunits into different subtypes occurred earlier than that of alpha-subunits. Additionally, our search strategy had an important methodological consequence: a comprehensive sequence search for a particular protein should also include the raw sequence data when possible because proteins might be missed in the completed assembled genome. The structure-based multiple proteasomal alignment of 433 sequences from 143 organisms can be downloaded from the URL dagger and will be updated regularly.
蛋白酶体是大型多聚体自我分隔的蛋白酶,在清除错误折叠的蛋白质、分解调节蛋白、通过特定的部分蛋白水解加工蛋白质、细胞周期控制以及为免疫呈递准备肽段等过程中发挥关键作用。根据其不同的三级结构可区分出两种主要类型:20S蛋白酶体和由热休克基因座V(hslV)编码的蛋白酶体样热休克蛋白。通常,每个生物界都以其特定类型的蛋白酶体为特征。20S蛋白酶体存在于真核生物和古细菌中,而hslV蛋白酶在细菌中普遍存在。为了验证这一规律,我们进行了全基因组序列搜索,以在已完成和未完成的基因组计划数据中识别蛋白酶体序列。我们发现了这一模式的几个例外情况:(1)原生生物:除了20S蛋白酶体之外,利什曼原虫、锥虫和疟原虫还含有hslV,这可能是从线粒体的α-变形菌祖先那里获得的。(2)细菌:对于趋磁螺菌和粪肠球菌,我们发现由于基因复制或水平转移,它们各自都含有两种不同的hslV。将未组装数据纳入分析后,我们证实由于基因丢失,许多细菌基因组不包含任何蛋白酶体序列。(3)高G+C革兰氏阳性菌:我们证实高G+C革兰氏阳性菌拥有20S蛋白酶体而非hslV蛋白酶。20S蛋白酶体的核心由两种不同的主要类型的同源单体α和β组成,它们通过进一步的基因复制分化为七个亚型。通过研究细胞内病原体兔脑炎微孢子虫的基因组,我们能够表明β型亚基分化为不同亚型的时间早于α亚基。此外,我们的搜索策略有一个重要的方法学后果:对特定蛋白质的全面序列搜索在可能的情况下还应包括原始序列数据,因为在已完成组装的基因组中可能会遗漏蛋白质。来自143个生物体的433个序列基于结构的多蛋白酶体比对可从URL †下载,并将定期更新。