Department of Cellular and Molecular Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland.
PLoS One. 2013 May 9;8(5):e62272. doi: 10.1371/journal.pone.0062272. Print 2013.
The zinc-dependent metalloproteases with His-Glu-x-x-His (HExxH) active site motif, zincins, are a broad group of proteins involved in many metabolic and regulatory functions, and found in all forms of life. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database). Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases. Furthermore, several HExxH-containing protein domains thus identified can be confidently predicted to be putative peptidases of zincin fold. Thus, we predict zincin-like fold for eight uncharacterised Pfam families. Besides the domains with the HExxH motif strictly conserved, and those with sporadic occurrences, intermediate families are identified that contain some members with a conserved HExxH motif, but also many homologues with substitutions at the conserved positions. Such substitutions can be evolutionarily conserved and non-random, yet functional roles of these inactive zincins are not known. The CLCAs are a novel zincin-like protease family with many cases of substituted active sites. We show that this allegedly metazoan family has a number of bacterial and archaeal members. An extremely patchy phylogenetic distribution of CLCAs in prokaryotes and their conserved protein domain composition strongly suggests an evolutionary scenario of horizontal gene transfer (HGT) from multicellular eukaryotes to bacteria, providing an example of eukaryote-derived xenologues in bacterial genomes. Additionally, in a protein family identified here as closely homologous to CLCA, the CLCA_X (CLCA-like) family, a number of proteins is found in phages and plasmids, supporting the HGT scenario.
锌依赖的具有 His-Glu-x-x-His (HExxH) 活性位点模体的金属蛋白酶,锌内肽酶,是一组广泛参与多种代谢和调节功能的蛋白质,存在于所有生命形式中。人类基因组包含 100 多个编码具有已知锌内肽酶样结构域的蛋白质的基因。对所有含有 HExxH 模体的蛋白质进行调查表明,大约 52%的 HExxH 出现位于已知的蛋白质结构域内(如 Pfam 数据库中所定义的)。具有大多数成员具有保守 HExxH 模体的结构域家族包括,毫不奇怪,许多已知和假定的金属蛋白酶。此外,因此鉴定的几个含有 HExxH 结构域的蛋白质结构域可以被自信地预测为锌内肽酶折叠的假定肽酶。因此,我们预测了八个未被表征的 Pfam 家族具有锌内肽酶样折叠。除了严格保守 HExxH 模体的结构域和零星出现的结构域之外,还鉴定出了中间家族,其中包含一些具有保守 HExxH 模体的成员,但也有许多在保守位置发生取代的同源物。这种取代可以是进化保守的和非随机的,但这些无活性锌内肽酶的功能作用尚不清楚。CLCAs 是一个具有许多取代活性位点的新型锌内肽酶家族。我们表明,这个据称的后生动物家族有许多细菌和古菌成员。CLCAs 在原核生物中的极其分散的系统发育分布及其保守的蛋白质结构域组成强烈表明了从多细胞真核生物到细菌的水平基因转移(HGT)的进化情景,为细菌基因组中真核生物衍生的异种蛋白提供了一个例子。此外,在一个在此处鉴定为与 CLCA 密切同源的蛋白质家族中,CLCA_X(CLCA 样)家族,在噬菌体和质粒中发现了一些蛋白质,支持了 HGT 情景。